UG-PSYCHOLOGY, SEMESTER-3, MJC-3, ALL

UNIT-1 (1.2) Variables: Meaning and Types (Categorical and Continuous)

चर (Variables): अर्थ और प्रकार (श्रेणीय और निरंतर)

UNIT-1 (1.2) Variables: Meaning and Types (Categorical and Continuous)

मापन के स्तर (Levels of Measurement): नाममात्रिक, क्रमबद्ध, अंतरिक, और अनुपात

वर्णनात्मक और अनुमानात्मक सांख्यिकी की मूल अवधारणाएँ

डेटा का आवृत्ति वितरण और उसका ग्राफ़िक प्रस्तुतीकरण: हिस्टोग्राम, पॉलीगॉन और ओगिव

केन्द्रीय प्रवृत्ति के माप: माध्य, माध्यिका और बहुलक की गणना

प्रसरण के माप: सीमा (Range), चतुर्थक विचलन (QD), माध्य परास विचलन (AD), और मानक विचलन (SD) की गणना

सहसंबंध (Correlation): अवधारणा और प्रकार

सहसंबंध की गणना: उत्पाद-मोमेंट विधि और रैंक-अंतर विधि

t-परिक्षण (t-test) की गणना: स्वतंत्र समूह और सहसंबद्ध समूह

काइ-स्क्वायर परीक्षण (Chi-Square Test): अवधारणा, प्रकार और अनुप्रयोग

Chi-Square Test: Concept, Types, and Applications

काइ-स्क्वायर (Chi-Square) की गणना: समान वितरण परिकल्पना और स्वतंत्र परिकल्पना

Computation of Chi-Square: Equal Distribution Hypothesis and Independent Hypothesis

परिचय

चर का अर्थ

चर के प्रकार

1. श्रेणीय चर (Categorical Variables)

2. निरंतर चर (Continuous Variables)

श्रेणीय और निरंतर चरों के बीच मुख्य अंतर

निष्कर्ष

Meaning of Variables

Types of Variables

1. Categorical Variables (Qualitative Variables)

2. Continuous Variables (Quantitative Variables)

Key Differences Between Categorical and Continuous Variables

Conclusion

परिचय

1. नाममात्रिक स्तर (Nominal Level of Measurement)

2. क्रमबद्ध स्तर (Ordinal Level of Measurement)

3. अंतरिक स्तर (Interval Level of Measurement)

4. अनुपात स्तर (Ratio Level of Measurement)

निष्कर्ष

Introduction

1. Nominal Level of Measurement

2. Ordinal Level of Measurement

3. Interval Level of Measurement

4. Ratio Level of Measurement

Key Differences Between Levels of Measurement

Conclusion

परिचय

1. वर्णनात्मक सांख्यिकी (Descriptive Statistics)

2. अनुमानात्मक सांख्यिकी (Inferential Statistics)

3. वर्णनात्मक और अनुमानात्मक सांख्यिकी में अंतर

निष्कर्ष

Introduction

1. Descriptive Statistics

2. Inferential Statistics

3. Differences Between Descriptive and Inferential Statistics

4. Importance of Descriptive and Inferential Statistics

Conclusion

परिचय

1. डेटा का आवृत्ति वितरण (Frequency Distribution of Data)

2. डेटा का ग्राफ़िक प्रस्तुतीकरण (Graphical Presentation of Frequency Distribution)

निष्कर्ष

Introduction

1. Frequency Distribution of Data

2. Graphical Presentation of Frequency Distribution

Conclusion

परिचय

1. माध्य (Mean)

2. माध्यिका (Median)

3. बहुलक (Mode)

निष्कर्ष

Introduction

1. Mean (Arithmetic Mean)

2. Median

3. Mode

Conclusion

परिचय

1. सीमा (Range)

2. चतुर्थक विचलन (Quartile Deviation – QD)

3. माध्य परास विचलन (Mean Absolute Deviation – AD)

4. मानक विचलन (Standard Deviation – SD)

निष्कर्ष

Introduction

1. Range

2. Quartile Deviation (QD) or Semi-Interquartile Range

3. Mean Absolute Deviation (AD)

4. Standard Deviation (SD)

Conclusion

परिचय

1. सहसंबंध की अवधारणा

2. सहसंबंध के प्रकार

3. वास्तविक जीवन में सहसंबंध के उदाहरण

4. सहसंबंध की सीमाएँ

निष्कर्ष

Introduction

1. Concept of Correlation

2. Types of Correlation

3. Real-Life Applications of Correlation

4. Limitations of Correlation

Conclusion

परिचय

1. सहसंबंध की अवधारणा और महत्व

2. उत्पाद-मोमेंट विधि (Pearson’s Correlation Coefficient, r)

3. रैंक-अंतर विधि (Spearman’s Rank Correlation Coefficient, rₛ)

4. वास्तविक जीवन में सहसंबंध के अनुप्रयोग

5. सहसंबंध की सीमाएँ

निष्कर्ष

Introduction

1. Concept of Correlation and Its Significance

2. Product Moment Method (Pearson’s Correlation Coefficient, r)

3. Rank Difference Method (Spearman’s Rank Correlation Coefficient, rₛ)

4. Real-Life Applications of Correlation

5. Limitations of Correlation

Conclusion

परिचय

1. t-परिक्षण की अवधारणा और महत्व

2. स्वतंत्र नमूना t-परिक्षण (Independent Samples t-test)

3. युग्मित (सहसंबद्ध) नमूना t-परिक्षण (Paired Samples t-test)

4. t-परिक्षण की सीमाएँ और धारणाएँ

निष्कर्ष

Introduction

1. Concept of t-test and Its Importance

2. Independent Samples t-test

3. Paired (Correlated) Samples t-test

4. Interpretation of t-test Results

5. Assumptions and Limitations of t-test

Conclusion

परिचय

1. काइ-स्क्वायर परीक्षण की अवधारणा

2. काइ-स्क्वायर परीक्षण के प्रकार

3. काइ-स्क्वायर परीक्षण की पूर्व-धारणाएँ

4. काइ-स्क्वायर परीक्षण की गणना (चरण-दर-चरण प्रक्रिया)

5. परिणामों की व्याख्या

6. काइ-स्क्वायर परीक्षण के अनुप्रयोग

7. काइ-स्क्वायर परीक्षण की सीमाएँ

निष्कर्ष

Introduction

1. Concept of the Chi-Square Test

2. Types of Chi-Square Tests

3. Assumptions of the Chi-Square Test

4. Chi-Square Test Calculation (Step-by-Step)

5. Interpretation of Results

6. Applications of the Chi-Square Test

7. Limitations of the Chi-Square Test

Conclusion

परिचय

1. काइ-स्क्वायर परीक्षण का सूत्र और अवधारणा

2. समान वितरण परिकल्पना (Equal Distribution Hypothesis) के लिए काइ-स्क्वायर की गणना

3. स्वतंत्र परिकल्पना (Independent Hypothesis) के लिए काइ-स्क्वायर की गणना

4. काइ-स्क्वायर परीक्षण के अनुप्रयोग

5. निष्कर्ष

Introduction

1. Concept and Formula of the Chi-Square Test

2. Computation of Chi-Square for Equal Distribution Hypothesis (Goodness-of-Fit Test)

3. Computation of Chi-Square for Independent Hypothesis (Test of Independence)

4. Interpretation of Results

5. Applications of the Chi-Square Test

Conclusion

चर की परिभाषा

अनुसंधान में चरों का महत्व

परिभाषा

श्रेणीय चरों के प्रकार

श्रेणीय चरों का अनुप्रयोग

परिभाषा

निरंतर चरों के प्रकार

निरंतर चरों का अनुप्रयोग

Definition of a Variable

Importance of Variables in Research

Definition

Types of Categorical Variables

Applications of Categorical Variables

Definition

Types of Continuous Variables

Applications of Continuous Variables

परिभाषा

मुख्य विशेषताएँ

उदाहरण

सांख्यिकीय विधियाँ

अनुप्रयोग

परिभाषा

मुख्य विशेषताएँ

उदाहरण

सांख्यिकीय विधियाँ

अनुप्रयोग

परिभाषा

मुख्य विशेषताएँ

उदाहरण

सांख्यिकीय विधियाँ

अनुप्रयोग

परिभाषा

मुख्य विशेषताएँ

उदाहरण

सांख्यिकीय विधियाँ

अनुप्रयोग

Definition

Characteristics

Examples

Statistical Techniques

Applications

Definition

Characteristics

Examples

Statistical Techniques

Applications

Definition

Characteristics

Examples

Statistical Techniques

Applications

Definition

Characteristics

Examples

Statistical Techniques

Applications

परिभाषा

मुख्य विशेषताएँ

वर्णनात्मक सांख्यिकी के प्रकार

वर्णनात्मक सांख्यिकी के अनुप्रयोग

परिभाषा

मुख्य विशेषताएँ

अनुमानात्मक सांख्यिकी के प्रकार

अनुप्रयोग

Definition

Key Features

Types of Descriptive Statistics

Applications of Descriptive Statistics

Definition

Key Features

Types of Inferential Statistics

Applications of Inferential Statistics

परिभाषा

मुख्य विशेषताएँ

आवृत्ति वितरण के प्रकार

आवृत्ति वितरण तालिका बनाने की प्रक्रिया

2.1 हिस्टोग्राम (Histogram)

2.2 आवृत्ति बहुभुज (Frequency Polygon)

2.3 ओगिव (Ogive या Cumulative Frequency Curve)

Definition

Key Features

Types of Frequency Distribution

Steps to Construct a Frequency Distribution Table

2.1 Histogram

2.2 Frequency Polygon

2.3 Ogive (Cumulative Frequency Curve)

परिभाषा

माध्य के सूत्र

उदाहरण 1: असमूहीकृत डेटा के लिए माध्य

उदाहरण 2: समूहीकृत डेटा के लिए माध्य

परिभाषा

माध्यिका ज्ञात करने की प्रक्रिया

समूहीकृत डेटा के लिए माध्यिका का सूत्र

उदाहरण 1: असमूहीकृत डेटा के लिए माध्यिका

परिभाषा

बहुलक ज्ञात करने की प्रक्रिया

उदाहरण 1: असमूहीकृत डेटा के लिए बहुलक

उदाहरण 2: समूहीकृत डेटा के लिए बहुलक

Definition

Formula for Mean

Example 1: Mean for Ungrouped Data

Example 2: Mean for Grouped Data

Definition

Steps to Find the Median

Formula for Median in Grouped Data

Example 1: Median for Ungrouped Data

Example 2: Median for Grouped Data

Definition

Steps to Find the Mode

Example 1: Mode for Ungrouped Data

Example 2: Mode for Grouped Data

परिभाषा

सीमा का सूत्र

उदाहरण 1: असमूहीकृत डेटा के लिए सीमा

उदाहरण 2: समूहीकृत डेटा के लिए सीमा

सीमा की सीमाएँ

परिभाषा

चतुर्थक विचलन का सूत्र

उदाहरण: असमूहीकृत डेटा के लिए QD

परिभाषा

माध्य परास विचलन का सूत्र

उदाहरण: AD की गणना

परिभाषा

मानक विचलन का सूत्र

उदाहरण: SD की गणना

Definition

Formula for Range

Example 1: Range for Ungrouped Data

Example 2: Range for Grouped Data

Limitations of Range

Definition

Formula for Quartile Deviation

Example: QD for Ungrouped Data

Definition

Formula for Mean Absolute Deviation

Example: AD Calculation

Definition

Formula for Standard Deviation

Example: SD Calculation

परिभाषा

सहसंबंध की विशेषताएँ

गणितीय अभिव्यक्ति

A. संबंध की दिशा के आधार पर सहसंबंध

B. चरों की संख्या के आधार पर सहसंबंध

C. मापन विधि के आधार पर सहसंबंध

1. मनोविज्ञान में

2. व्यवसाय और अर्थशास्त्र में

3. चिकित्सा और स्वास्थ्य में

4. शिक्षा में

Definition

Properties of Correlation

Mathematical Representation

A. Based on Direction of Relationship

B. Based on Number of Variables

C. Based on Method of Measurement

1. Psychology

2. Business and Economics

3. Healthcare and Medicine

4. Education

सहसंबंध की परिभाषा

सहसंबंध का महत्व

अवधारणा

समीकरण

चरण-दर-चरण गणना (उदाहरण)

अवधारणा

समीकरण

चरण-दर-चरण गणना (उदाहरण)

Definition

Importance of Correlation in Research

Concept

Formula

Step-by-Step Calculation

Concept

Formula

Step-by-Step Calculation

t-परिक्षण क्या है?

t-परिक्षण का महत्व

अवधारणा

सूत्र

गणना का उदाहरण

अवधारणा

सूत्र

गणना का उदाहरण

धारणाएँ (Assumptions)

सीमाएँ (Limitations)

What is a t-test?

Why is the t-test important?

Concept

Formula

Step-by-Step Calculation

Concept

Formula

Step-by-Step Calculation

Assumptions

Limitations

काइ-स्क्वायर परीक्षण क्या है?

काइ-स्क्वायर परीक्षण का सूत्र

A. समरूपता के लिए काइ-स्क्वायर परीक्षण (Goodness of Fit Test)

B. स्वतंत्रता के लिए काइ-स्क्वायर परीक्षण (Test for Independence)

उदाहरण समस्या

चरण 1: अपेक्षित आवृत्तियाँ निकालें

चरण 2: काइ-स्क्वायर सूत्र लागू करें

चरण 3: महत्वपूर्ण मान (Critical Value) से तुलना करें

What is the Chi-Square Test?

Formula for the Chi-Square Test

A. Chi-Square Test for Goodness of Fit

B. Chi-Square Test for Independence

Example Problem

काइ-स्क्वायर का गणितीय सूत्र

अवधारणा

उदाहरण समस्या

चरण 1: काइ-स्क्वायर सूत्र लागू करें

चरण 2: डिग्री ऑफ फ्रीडम (df) निकालें

चरण 3: महत्वपूर्ण मान (Critical Value) से तुलना करें

अवधारणा

उदाहरण समस्या

चरण 1: अपेक्षित आवृत्तियाँ (Expected Frequency) निकालें

चरण 2: काइ-स्क्वायर सूत्र लागू करें

चरण 3: डिग्री ऑफ फ्रीडम (df) निकालें

Chi-Square Formula

Concept

Example Problem

Concept

Example Problem

A. नाममात्रिक चर (Nominal Variables)

B. क्रमबद्ध चर (Ordinal Variables)

A. अंतरिक (Interval Variables)

B. अनुपात चर (Ratio Variables)

A. Nominal Variables

B. Ordinal Variables

A. Interval Variables

B. Ratio Variables

1.1 केंद्रीय प्रवृत्ति के माप (Measures of Central Tendency)

1.2 प्रसार के माप (Measures of Dispersion)

1.3 डेटा का ग्राफ़िकल प्रतिनिधित्व

2.1 नमूकरण और जनसंख्या (Sampling and Population)

2.2 परिकल्पना परीक्षण (Hypothesis Testing)

2.3 विश्वास अंतराल (Confidence Interval)

2.4 सहसंबंध और प्रतिगमन (Correlation and Regression)

2.5 t-परीक्षण और ANOVA (Analysis of Variance)

1.1 Measures of Central Tendency

1.2 Measures of Dispersion (Variability)

1.3 Graphical Representation of Data

2.1 Sampling and Population

2.2 Hypothesis Testing

2.3 Confidence Intervals

2.4 Correlation and Regression Analysis

2.5 t-Test and ANOVA (Analysis of Variance)

1.1 असमूहीकृत आवृत्ति वितरण (Ungrouped Frequency Distribution)

1.2 समूहीकृत आवृत्ति वितरण (Grouped Frequency Distribution)

परिभाषा

मुख्य विशेषताएँ

हिस्टोग्राम बनाने की विधि

हिस्टोग्राम के उपयोग

परिभाषा

मुख्य विशेषताएँ

आवृत्ति बहुभुज बनाने की विधि

आवृत्ति बहुभुज के उपयोग

परिभाषा

ओगिव के प्रकार

ओगिव बनाने की विधि

ओगिव के उपयोग

Definition

Features of a Histogram

Steps to Construct a Histogram

Example

Uses of Histogram

Definition

Features of a Frequency Polygon

Steps to Construct a Frequency Polygon

Example Calculation for Midpoints

Uses of Frequency Polygon

Definition

Features of an Ogive

Steps to Construct an Ogive

Uses of Ogive

(क) असमूहीकृत डेटा के लिए माध्य

(ख) समूहीकृत डेटा के लिए माध्य

(a) Mean for Ungrouped Data

(b) Mean for Grouped Data

1. सकारात्मक सहसंबंध (Positive Correlation)

2. नकारात्मक सहसंबंध (Negative Correlation)

3. शून्य सहसंबंध (Zero Correlation)

1. सरल सहसंबंध (Simple Correlation)

2. बहु-सहसंबंध (Multiple Correlation)

3. आंशिक सहसंबंध (Partial Correlation)

1. पियर्सन सहसंबंध (Pearson’s Correlation Coefficient, r)

2. स्पीयरमैन रैंक सहसंबंध (Spearman’s Rank Correlation, rₛ)

3. केंडल का टाउ (Kendall’s Tau Correlation)

4. पॉइंट-बिसेरियल सहसंबंध (Point-Biserial Correlation)

5. फाई गुणांक (Phi Coefficient, φ)

1. Positive Correlation

2. Negative Correlation

3. Zero (No) Correlation

1. Simple Correlation

2. Multiple Correlation

3. Partial Correlation

1. Pearson’s Correlation Coefficient (r)

2. Spearman’s Rank Correlation

3. Kendall’s Tau Correlation

4. Point-Biserial Correlation

5. Phi Coefficient (φ)

Example:

Example:

उदाहरण:

परिणाम की व्याख्या

उदाहरण:

Example:

Interpretation

Example:

Interpretation

UnNoticed Digital College March 2, 2025

0 23 1 hours read

UNIT-1 (1.1) Meaning and Uses of Statistics in Psychology

मनोविज्ञान में सांख्यिकी का अर्थ और उपयोग

परिचय

सांख्यिकी मनोविज्ञान में एक महत्वपूर्ण उपकरण है, जो शोधकर्ताओं को डेटा का विश्लेषण करने, परिकल्पनाओं का परीक्षण करने और मानव विचारों, भावनाओं और व्यवहारों के बारे में सार्थक निष्कर्ष निकालने में सहायता करता है। यह संख्यात्मक डेटा को व्यवस्थित, संक्षिप्त और व्याख्या करने के लिए एक व्यवस्थित दृष्टिकोण प्रदान करता है। मनोवैज्ञानिक अनुसंधान अक्सर जटिल डेटा सेट से जुड़ा होता है, और सांख्यिकीय विधियाँ यह सुनिश्चित करने में मदद करती हैं कि निष्कर्ष विश्वसनीय, मान्य और व्यापक रूप से लागू किए जा सकें।

यह निबंध मनोविज्ञान में सांख्यिकी के अर्थ और विभिन्न उपयोगों का अन्वेषण करेगा, जिसमें डेटा संग्रह, परिकल्पना परीक्षण, सहसंबंध विश्लेषण, अनुमानात्मक सांख्यिकी, और नैदानिक, संज्ञानात्मक, सामाजिक और विकासात्मक मनोविज्ञान में इसके वास्तविक जीवन अनुप्रयोग शामिल हैं।

मनोविज्ञान में सांख्यिकी का अर्थ

सांख्यिकी की परिभाषा

सांख्यिकी गणित की एक शाखा है जो संख्यात्मक डेटा के संग्रह, संगठन, विश्लेषण, व्याख्या और प्रस्तुति से संबंधित है। मनोविज्ञान में, सांख्यिकी का उपयोग मानव व्यवहार और मानसिक प्रक्रियाओं में पैटर्न को समझने के लिए किया जाता है।

मनोविज्ञान में सांख्यिकी के प्रकार

मनोविज्ञान में सांख्यिकी को मुख्य रूप से दो भागों में विभाजित किया जाता है:

वर्णनात्मक सांख्यिकी (Descriptive Statistics) – इसका उपयोग डेटा को संक्षेप में प्रस्तुत करने और व्यवस्थित करने के लिए किया जाता है। इसमें निम्नलिखित उपाय शामिल हैं:
- माध्य (Mean): डेटा सेट के केंद्रीय मान का प्रतिनिधित्व करता है।
- माध्यिका (Median): जब डेटा को आरोही क्रम में रखा जाता है तो यह मध्य मान होता है।
- बहुलक (Mode): सबसे अधिक बार आने वाला मान।
- प्रमाणित विचलन (Standard Deviation): यह मापता है कि डेटा औसत से कितना विचलित है।
- परास (Range) और प्रकीर्णन (Variance): डेटा बिंदुओं के प्रसार को इंगित करते हैं।
अनुमानात्मक सांख्यिकी (Inferential Statistics) – इसका उपयोग नमूना डेटा के आधार पर पूरी जनसंख्या के बारे में निष्कर्ष निकालने के लिए किया जाता है। इसमें शामिल हैं:
- परिकल्पना परीक्षण (Hypothesis Testing): यह निर्धारित करने के लिए कि क्या प्राप्त परिणाम केवल संयोग से हैं या वास्तविक प्रभाव का संकेत देते हैं।
- सहसंबंध विश्लेषण (Correlation Analysis): चर (Variables) के बीच संबंधों का आकलन करता है।
- प्रतिगमन विश्लेषण (Regression Analysis): एक या अधिक चर के आधार पर परिणामों की भविष्यवाणी करना।
- टी-टेस्ट और एनोवा (T-tests और ANOVA): समूहों के बीच अंतर की तुलना करना।

वर्णनात्मक और अनुमेय सांख्यिकी, दोनों, मनोवैज्ञानिकों को प्रयोगात्मक डेटा का विश्लेषण करने और अनुभवजन्य साक्ष्यों के आधार पर सूचित निर्णय लेने में मदद करते हैं।

मनोविज्ञान में सांख्यिकी के उपयोग

सांख्यिकी मनोविज्ञान में कई तरीकों से महत्वपूर्ण भूमिका निभाती है। नीचे मनोवैज्ञानिक अनुसंधान और व्यवहार में सांख्यिकी के मुख्य उपयोगों को समझाया गया है:

1. डेटा को व्यवस्थित और संक्षेप में प्रस्तुत करना

मनोवैज्ञानिक अनुसंधान में अक्सर प्रयोगों, सर्वेक्षणों या अवलोकनों से बड़े पैमाने पर डेटा एकत्र किया जाता है। वर्णनात्मक सांख्यिकी इस डेटा को एक सार्थक प्रारूप में व्यवस्थित करने में मदद करती है।

उदाहरण:

एक मनोवैज्ञानिक जो कॉलेज के छात्रों में चिंता के स्तर का अध्ययन कर रहा है, वह चिंता स्कोर के वितरण को दर्शाने के लिए हिस्टोग्राम (Histogram) और पाई चार्ट (Pie Chart) का उपयोग कर सकता है।

2. परिकल्पना का परीक्षण और निष्कर्ष निकालना

अनुमानात्मक सांख्यिकी मनोवैज्ञानिकों को परिकल्पनाओं का परीक्षण करने और यह निर्धारित करने में मदद करती है कि उनके निष्कर्ष सांख्यिकीय रूप से महत्वपूर्ण हैं या नहीं।

उदाहरण:

यदि एक शोधकर्ता ध्यान (Meditation) के तनाव पर प्रभाव का अध्ययन कर रहा है, तो वह एक प्रयोग कर सकता है और टी-टेस्ट (T-Test) का उपयोग करके ध्यान समूह और नियंत्रण समूह के तनाव स्तरों की तुलना कर सकता है। यदि p-मूल्य (P-value) 0.05 से कम है, तो इसका मतलब है कि ध्यान तनाव को कम करने में महत्वपूर्ण भूमिका निभाता है।

3. चर के बीच संबंध को मापना (सहसंबंध विश्लेषण)

सांख्यिकी मनोवैज्ञानिकों को विभिन्न मनोवैज्ञानिक चर के बीच संबंधों को समझने में मदद करती है।

उदाहरण:

एक मनोवैज्ञानिक जो नींद की कमी और स्मरणशक्ति (Memory) के बीच संबंध की जांच कर रहा है, वह एक नकारात्मक सहसंबंध (Negative Correlation) पा सकता है, जो दर्शाता है कि कम नींद से स्मरणशक्ति प्रभावित होती है।

हालांकि, सहसंबंध हमेशा कारण और प्रभाव (Causation) को इंगित नहीं करता है।

4. प्रतिगमन विश्लेषण के माध्यम से पूर्वानुमान लगाना

प्रतिगमन विश्लेषण (Regression Analysis) एक या अधिक चर के आधार पर भविष्यवाणी करने में मदद करता है।

उदाहरण:

एक मनोवैज्ञानिक यह अनुमान लगाने के लिए प्रतिगमन विश्लेषण का उपयोग कर सकता है कि किसी छात्र का शैक्षणिक प्रदर्शन (Academic Performance) उसकी आईक्यू, प्रेरणा और अध्ययन की आदतों पर आधारित है।

5. समूहों की तुलना करना (T-Tests और ANOVA)

टी-टेस्ट और एनोवा का उपयोग यह निर्धारित करने के लिए किया जाता है कि क्या समूहों के बीच महत्वपूर्ण अंतर हैं।

उदाहरण:

यदि कोई सामाजिक मनोवैज्ञानिक समूह चिकित्सा (Group Therapy) और व्यक्तिगत परामर्श (Individual Counseling) की प्रभावशीलता की तुलना कर रहा है, तो वह टी-टेस्ट का उपयोग करके यह पता लगा सकता है कि किस विधि से अवसाद में अधिक सुधार हुआ।

6. मनोवैज्ञानिक परीक्षणों का मूल्यांकन (Psychometrics)

मनोवैज्ञानिक परीक्षणों की विश्वसनीयता (Reliability) और वैधता (Validity) का मूल्यांकन करने के लिए सांख्यिकी का उपयोग किया जाता है।

उदाहरण:

बिग फाइव पर्सनैलिटी टेस्ट (Big Five Personality Test) को यह सुनिश्चित करने के लिए सांख्यिकीय तकनीकों से परखा जाता है कि यह व्यक्तित्व लक्षणों को सही ढंग से माप रहा है।

7. नैदानिक मनोविज्ञान में उपयोग

सांख्यिकी का उपयोग नैदानिक मनोविज्ञान (Clinical Psychology) में निदान, उपचार मूल्यांकन और रोग के पूर्वानुमान के लिए किया जाता है।

उदाहरण:

एक मनोवैज्ञानिक यह मापने के लिए सांख्यिकीय विश्लेषण का उपयोग कर सकता है कि अवसाद उपचार के बाद मरीजों की स्थिति में कितना सुधार हुआ है।

8. संज्ञानात्मक और तंत्रिका विज्ञान अनुसंधान में उपयोग

सांख्यिकी का उपयोग मस्तिष्क इमेजिंग डेटा, प्रतिक्रिया समय और संज्ञानात्मक प्रदर्शन का विश्लेषण करने के लिए किया जाता है।

उदाहरण:

एक संज्ञानात्मक मनोवैज्ञानिक स्मृति अध्ययन (Memory Study) के लिए सांख्यिकीय मॉडल का उपयोग कर सकता है।

निष्कर्ष

सांख्यिकी मनोविज्ञान में एक आवश्यक उपकरण है जो शोधकर्ताओं को डेटा का विश्लेषण करने, परिकल्पनाओं का परीक्षण करने और व्यवहार को समझने में सहायता करता है। यह वैज्ञानिक अनुसंधान की विश्वसनीयता और वैधता सुनिश्चित करता है। हालांकि, सांख्यिकी का सावधानीपूर्वक उपयोग किया जाना चाहिए ताकि डेटा की गलत व्याख्या न हो।

इस प्रकार, सांख्यिकी के उचित उपयोग से मनोवैज्ञानिक सिद्धांतों, उपचारों और हस्तक्षेपों में सुधार किया जा सकता है।

UNIT-1 (1.1) Meaning and Uses of Statistics in Psychology

Introduction

Statistics is an essential tool in psychology, enabling researchers to analyze data, test hypotheses, and draw meaningful conclusions about human thoughts, emotions, and behaviors. It provides a systematic approach to understanding psychological phenomena by organizing, summarizing, and interpreting numerical data. Psychological research often involves complex data sets, and statistical methods help ensure that findings are reliable, valid, and generalizable to larger populations.

This essay explores the meaning of statistics in psychology and its various uses, including data collection, hypothesis testing, correlation analysis, inferential statistics, and real-world applications in clinical, cognitive, social, and developmental psychology.

Meaning of Statistics in Psychology

Definition of Statistics

Statistics refers to a branch of mathematics that deals with the collection, organization, analysis, interpretation, and presentation of numerical data. In psychology, statistics are used to understand patterns in human behavior and mental processes through empirical research.

Types of Statistics in Psychology

Statistics in psychology can be broadly categorized into two types:

Descriptive Statistics – Used to summarize and organize data. It includes measures such as:
- Mean (average): Represents the central value of a data set.
- Median: The middle value when data is arranged in ascending order.
- Mode: The most frequently occurring value.
- Standard Deviation: Measures how much data deviates from the mean.
- Range and Variance: Indicate the spread of data points.
Inferential Statistics – Used to make predictions or generalizations about a population based on sample data. It includes:
- Hypothesis Testing: Determining whether observed results are due to chance or an actual effect.
- Correlation Analysis: Assessing relationships between variables.
- Regression Analysis: Predicting outcomes based on one or more variables.
- T-tests and ANOVA: Comparing group differences.

Both descriptive and inferential statistics help psychologists analyze experimental data and make informed decisions based on empirical evidence.

Uses of Statistics in Psychology

Statistics plays a crucial role in psychology in several ways. Below are the primary uses of statistics in psychological research and practice:

1. Organizing and Summarizing Data

Psychological research often involves large amounts of data collected from experiments, surveys, or observations. Descriptive statistics help in organizing this data into a meaningful format. For example, psychologists use frequency distributions, graphs, and tables to present findings in a clear and concise manner.

Example:

A psychologist studying anxiety levels in college students may use histograms and pie charts to visually represent the distribution of anxiety scores.

2. Testing Hypotheses and Drawing Conclusions

Inferential statistics allow psychologists to test hypotheses and determine whether their findings are statistically significant. This process involves setting up null and alternative hypotheses and using statistical tests to validate or reject them.

Example:

A researcher studying the effect of meditation on stress reduction may conduct an experiment and use a t-test to compare stress levels between a meditation group and a control group. If the p-value is below 0.05, the researcher can conclude that meditation significantly reduces stress.

3. Measuring Relationships Between Variables (Correlation Analysis)

Statistics help psychologists understand the relationships between different psychological variables. Correlation analysis measures how strongly two variables are related.

Example:

A psychologist investigating the relationship between sleep deprivation and memory performance may find a negative correlation, indicating that less sleep is associated with poorer memory retention.

However, correlation does not imply causation. Other factors may influence the observed relationship, and further research is often needed.

4. Making Predictions Using Regression Analysis

Regression analysis helps psychologists predict outcomes based on one or more variables. This is especially useful in clinical psychology, where predicting patient behavior can aid in treatment planning.

Example:

A psychologist may use regression analysis to predict a student’s academic performance based on factors like IQ, motivation, and study habits.

5. Comparing Groups Using T-tests and ANOVA

Statistical tests like the t-test and ANOVA (Analysis of Variance) are used to determine whether there are significant differences between groups.

Example:

A social psychologist studying the effects of group therapy on depression may compare depression scores between participants in group therapy and those receiving individual counseling. A t-test can reveal whether the differences in depression scores are statistically significant.

ANOVA is useful when comparing more than two groups, such as studying the effects of different teaching methods on student performance.

6. Evaluating Psychological Tests and Scales (Psychometrics)

Statistics play a vital role in developing and validating psychological tests, such as intelligence tests, personality assessments, and mental health screenings. Psychometricians use statistical techniques to assess:

Reliability: The consistency of a test over time.
Validity: Whether the test measures what it claims to measure.

Example:

The Big Five Personality Test is evaluated using factor analysis to ensure that it accurately measures personality traits like openness, conscientiousness, extraversion, agreeableness, and neuroticism.

7. Statistical Methods in Psychological Research Designs

Statistics help psychologists design experiments that minimize biases and errors. Common research designs include:

Experimental Design: Involves random assignment of participants to experimental and control groups to determine cause-and-effect relationships.
Quasi-Experimental Design: Used when random assignment is not possible, often in real-world settings.
Longitudinal Studies: Analyze changes in behavior over time using repeated observations.
Cross-Sectional Studies: Compare different groups at a single point in time.

Example:

A developmental psychologist studying cognitive decline in aging adults may conduct a longitudinal study, tracking participants’ memory performance over decades.

8. Clinical Applications of Statistics

Statistics is widely used in clinical psychology for diagnosis, treatment evaluation, and prognosis. Clinicians rely on statistical data to determine the effectiveness of therapies and interventions.

Example:

A clinical psychologist treating depression may use statistical analysis to compare the effectiveness of cognitive-behavioral therapy (CBT) versus medication by analyzing patients’ depression scores before and after treatment.

Statistical models also help in predicting patient relapse rates and identifying risk factors for mental illnesses.

9. Statistical Applications in Social Psychology

Social psychologists use statistical methods to analyze group behavior, attitudes, and social interactions. Common applications include:

Studying prejudice and discrimination using survey data.
Analyzing voting patterns and political opinions.
Examining the effects of social media on self-esteem and mental health.

Example:

A social psychologist studying conformity may use statistical tests to analyze how group pressure influences individual decision-making.

10. Neuroscience and Cognitive Psychology Applications

In cognitive and neuroscience research, statistics are used to analyze brain imaging data, reaction times, and cognitive performance. Techniques like fMRI and EEG generate massive datasets that require sophisticated statistical analysis.

Example:

A cognitive psychologist studying memory may use statistical models to analyze reaction time data from memory recall tasks.

Challenges and Limitations of Using Statistics in Psychology

While statistics provide valuable insights, there are limitations to their use in psychology:

Misinterpretation of Data: Correlation does not imply causation, and researchers must be cautious in drawing conclusions.
Sampling Bias: Results may not be generalizable if the sample is not representative of the larger population.
Overreliance on P-values: Statistical significance does not always indicate practical significance.
Ethical Concerns: In psychological research, data collection and analysis must adhere to ethical guidelines to ensure participant privacy and informed consent.

Conclusion

Statistics is a fundamental tool in psychology, aiding researchers and practitioners in organizing data, testing hypotheses, measuring relationships, and making predictions. From clinical applications to social and cognitive research, statistical methods provide valuable insights into human behavior and mental processes. However, psychologists must be cautious in interpreting statistical results and acknowledge the limitations of statistical analysis.

By applying statistics responsibly, psychologists can enhance the validity and reliability of their findings, ultimately improving psychological theories, therapies, and interventions.

चर (Variables) किसी भी अनुसंधान और डेटा विश्लेषण का एक महत्वपूर्ण हिस्सा होते हैं। वे वैज्ञानिक जांचों की नींव होते हैं और मनोविज्ञान, सामाजिक विज्ञान, और प्राकृतिक विज्ञान जैसे विभिन्न क्षेत्रों में उपयोग किए जाते हैं। चर को समझना अनुसंधान डिजाइन, डेटा विश्लेषण, और परिणामों की व्याख्या करने के लिए आवश्यक है।

यह निबंध चर के अर्थ, अनुसंधान में उनके महत्व और उनके मुख्य प्रकारों (श्रेणीय और निरंतर) की विस्तृत व्याख्या करेगा। साथ ही, इनका अनुप्रयोग, उदाहरण और प्रमुख अंतर भी बताए जाएंगे।

चर (Variable) वह विशेषता, संख्या, या मात्रा होती है जो विभिन्न मान ले सकती है। यह वह तत्व है जिसे शोधकर्ता किसी अध्ययन में मापते, नियंत्रित करते या विश्लेषण करते हैं। चर प्रयोगों में विभिन्न पहलुओं जैसे गुण, व्यवहार, स्थिति या परिणाम का प्रतिनिधित्व कर सकते हैं।

उदाहरण:

तनाव (Stress) पर किए गए एक अध्ययन में, तनाव स्तर एक चर है जो कम से उच्च तक भिन्न हो सकता है।
रक्तचाप (Blood Pressure) पर किए गए एक अध्ययन में, रक्तचाप रीडिंग एक चर है जो व्यक्ति-व्यक्ति भिन्न हो सकती है।

चर निम्नलिखित कार्यों में सहायक होते हैं:

विभिन्न कारकों के बीच संबंध पहचानना (जैसे, क्या नींद की अवधि स्मरणशक्ति को प्रभावित करती है?)।
परिकल्पनाओं का परीक्षण और सिद्धांतों की पुष्टि करना।
प्रयोगों में होने वाले परिवर्तनों को मापना।
डेटा विश्लेषण के आधार पर भविष्यवाणियां करना।

उदाहरण के लिए, यदि कोई शोधकर्ता यह जांचना चाहता है कि क्या कोई नई दवा चिंता (Anxiety) को कम करती है, तो वह खुराक की मात्रा (स्वतंत्र चर) और चिंता स्तर (आश्रित चर) का मापन करेगा।

चर को उनके मापन और अनुसंधान में उपयोग के आधार पर दो प्रमुख वर्गों में बांटा गया है:

श्रेणीय चर (Categorical Variables) – गुणात्मक (Qualitative)
निरंतर चर (Continuous Variables) – मात्रात्मक (Quantitative)

श्रेणीय चर वे होते हैं जो किसी विशेषता या गुणवत्ता को दर्शाते हैं और जिन्हें विभिन्न श्रेणियों या समूहों में वर्गीकृत किया जाता है। इन चरों में कोई संख्यात्मक मान नहीं होता जिसे गणितीय रूप से मापा या व्यवस्थित किया जा सके।

उदाहरण:

लिंग (Gender): पुरुष, महिला, अन्य
वैवाहिक स्थिति (Marital Status): अविवाहित, विवाहित, तलाकशुदा, विधवा
आंखों का रंग (Eye Color): भूरा, नीला, हरा

श्रेणीय चर को दो प्रकारों में विभाजित किया जाता है:

परिभाषा: नाममात्रिक चर वे होते हैं जिनमें कोई स्वाभाविक क्रम या रैंकिंग नहीं होती।
उदाहरण:
- रक्त समूह (A, B, AB, O)
- धर्म (हिंदू, मुस्लिम, ईसाई, सिख)
- राजनीतिक दल (कांग्रेस, भाजपा, आम आदमी पार्टी)
मुख्य विशेषता: श्रेणियां एक-दूसरे से अलग होती हैं, लेकिन उनका कोई विशेष क्रम नहीं होता।

परिभाषा: क्रमबद्ध चर वे होते हैं जिनकी श्रेणियां एक निश्चित क्रम में होती हैं, लेकिन उनके बीच का अंतर समान नहीं होता।
उदाहरण:
- शिक्षा स्तर (प्राथमिक, माध्यमिक, स्नातक, परास्नातक)
- सामाजिक-आर्थिक स्थिति (निम्न, मध्यम, उच्च)
- ग्राहक संतोष (बहुत असंतुष्ट, असंतुष्ट, संतुष्ट, बहुत संतुष्ट)
मुख्य विशेषता: श्रेणियां क्रम में व्यवस्थित होती हैं, लेकिन उनके बीच का अंतर निश्चित नहीं होता।

श्रेणीय चर का उपयोग मुख्य रूप से निम्नलिखित में किया जाता है:

सर्वेक्षण अनुसंधान (जैसे, जनसांख्यिकीय डेटा एकत्र करना)।
बाजार अनुसंधान (जैसे, ग्राहक प्राथमिकताएं)।
मेडिकल अध्ययन (जैसे, रोगों का वर्गीकरण)।

उदाहरण के लिए, एक मनोवैज्ञानिक यदि मानसिक स्वास्थ्य विकारों का अध्ययन कर रहा है, तो वह मरीजों को “चिंता,” “अवसाद,” और “द्विध्रुवीय विकार” जैसी श्रेणियों में वर्गीकृत कर सकता है।

निरंतर चर वे होते हैं जिनमें संख्यात्मक मान होते हैं और इन्हें एक पैमाने पर मापा जा सकता है। ये चर अनंत संख्याओं का मान ले सकते हैं और उनके बीच सूक्ष्म भिन्नताएँ हो सकती हैं।

उदाहरण:

ऊंचाई (Height): सेमी या इंच में मापी जाती है।
वजन (Weight): किलोग्राम या पाउंड में मापा जाता है।
आयु (Age): वर्षों, महीनों या दिनों में मापी जाती है।

परिभाषा: अंतरिक चर वे होते हैं जिनमें मूल्यों के बीच का अंतर समान होता है, लेकिन इनमें एक वास्तविक शून्य बिंदु नहीं होता।
उदाहरण:
- तापमान (°C या °F) – 0°C का अर्थ “कोई तापमान नहीं” नहीं होता।
- IQ स्कोर – 0 IQ का अर्थ “बुद्धिमत्ता का पूर्ण अभाव” नहीं होता।
मुख्य विशेषता: मूल्यों के बीच समान अंतर होता है, लेकिन शून्य का अर्थ “कुछ न होना” नहीं होता।

परिभाषा: अनुपात चर वे होते हैं जिनमें सभी अंतरिक चर की विशेषताएँ होती हैं, लेकिन इनमें एक वास्तविक शून्य बिंदु होता है।
उदाहरण:
- ऊंचाई (0 सेमी का अर्थ कोई ऊंचाई नहीं)।
- वजन (0 किग्रा का अर्थ कोई वजन नहीं)।
- आय (0 रुपये का अर्थ कोई आय नहीं)।
मुख्य विशेषता: इन मूल्यों को अनुपातों में तुलना किया जा सकता है (जैसे, 80 किग्रा वजन 40 किग्रा वजन से दोगुना है)।

निरंतर चर का उपयोग मुख्य रूप से निम्नलिखित में किया जाता है:

मेडिकल अनुसंधान (जैसे, रक्तचाप, कोलेस्ट्रॉल स्तर)।
मनोवैज्ञानिक अध्ययन (जैसे, प्रतिक्रिया समय, बुद्धिमत्ता स्कोर)।
विज्ञान और इंजीनियरिंग (जैसे, गति, तापमान)।

विशेषता	श्रेणीय चर	निरंतर चर
परिभाषा	विशेषताओं या समूहों का प्रतिनिधित्व करते हैं	संख्यात्मक मान होते हैं जो मापे जा सकते हैं
उदाहरण	लिंग (पुरुष, महिला), शिक्षा स्तर	ऊंचाई (175 सेमी), तापमान (37.5°C)
गणितीय संचालन	संभव नहीं	संभव (जोड़, गुणा, औसत)

चर अनुसंधान का एक अनिवार्य हिस्सा हैं और अध्ययन डिजाइन, डेटा संग्रह, और सांख्यिकीय विश्लेषण में महत्वपूर्ण भूमिका निभाते हैं। श्रेणीय चर वर्गीकरण और समूहबद्ध डेटा में सहायक होते हैं, जबकि निरंतर चर सटीक माप और गणितीय विश्लेषण में उपयोग किए जाते हैं।

चर की सही पहचान और वर्गीकरण से अनुसंधान की सटीकता, विश्वसनीयता और वैधता में सुधार होता है,

जिससे अधिक प्रभावी अध्ययन और वास्तविक दुनिया में अनुप्रयोग संभव होते हैं।

Introduction

Variables play a fundamental role in research and data analysis, forming the backbone of scientific investigations across various fields, including psychology, social sciences, and natural sciences. Understanding variables is crucial for designing studies, analyzing data, and interpreting results accurately.

This essay explores the meaning of variables, their significance in research, and the primary types, focusing on categorical and continuous variables. Additionally, it discusses their applications, examples, and key differences.

A variable is any characteristic, number, or quantity that can take different values. It is something that researchers measure, manipulate, or analyze in a study. Variables can represent different aspects of an experiment, such as traits, behaviors, conditions, or outcomes.

For example:

In a psychological study on stress, stress level is a variable that can vary from low to high.
In a medical study on blood pressure, blood pressure readings are variables that change from person to person.

Variables help researchers:

Identify relationships between different factors (e.g., Does sleep duration affect memory performance?).
Test hypotheses and validate theories.
Measure changes in an experimental setting.
Make predictions based on data analysis.

For example, in an experiment to test whether a new drug reduces anxiety, variables such as dosage amount (independent variable) and anxiety level (dependent variable) help measure the drug’s effect.

Variables are classified based on how they are measured and used in research. The two major types are:

Categorical Variables (Qualitative)
Continuous Variables (Quantitative)

Categorical variables represent characteristics or qualities that can be grouped into categories or labels. These variables do not have a numerical value that can be meaningfully measured or ordered in a standard way.

For example:

Gender (Male, Female, Other)
Marital Status (Single, Married, Divorced, Widowed)
Eye Color (Brown, Blue, Green)

Categorical variables can be divided into two subtypes:

Definition: Nominal variables represent categories with no inherent order or ranking.
Examples:
- Blood type (A, B, AB, O)
- Religion (Christianity, Islam, Hinduism, Buddhism)
- Political Party (Democrat, Republican, Independent)
Key Characteristic: The categories are mutually exclusive (each person fits into only one category), but they cannot be arranged in a meaningful sequence.

Definition: Ordinal variables have categories that follow a logical order or ranking, but the difference between the categories is not necessarily equal.
Examples:
- Education Level (Primary, Secondary, College, Postgraduate)
- Socioeconomic Status (Low, Middle, High)
- Customer Satisfaction (Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied)
Key Characteristic: The categories have a rank order, but the intervals between them are not uniform.

Categorical variables are widely used in:

Survey research (e.g., collecting demographic data).
Market research (e.g., customer preferences).
Epidemiology (e.g., disease classification).

For example, a researcher studying mental health disorders may classify patients into categories such as “Anxiety,” “Depression,” and “Bipolar Disorder.”

Continuous variables are numerical values that can be measured on a scale and can take an infinite number of values within a given range.

For example:

Height (measured in cm or inches)
Weight (measured in kg or pounds)
Age (measured in years, months, or days)

Continuous variables can be further divided into two types:

Definition: Interval variables have numerical values where the difference between numbers is meaningful, but there is no true zero point.
Examples:
- Temperature in Celsius or Fahrenheit (0°C does not mean “no temperature”).
- IQ Scores (An IQ of 0 does not indicate an absence of intelligence).
- SAT Scores (A score of 0 does not mean a total lack of ability).
Key Characteristic: The difference between values is consistent, but the zero point is arbitrary (not a true absence of the measured property).

Definition: Ratio variables have all the properties of interval variables, but they include a true zero (meaning “zero” represents an absence of the property).
Examples:
- Height (0 cm means no height).
- Weight (0 kg means no weight).
- Income (0 dollars means no money).
Key Characteristic: The data can be compared using ratios (e.g., a person weighing 80 kg is twice as heavy as a person weighing 40 kg).

Continuous variables are commonly used in:

Medical research (e.g., tracking blood pressure, cholesterol levels).
Psychological studies (e.g., measuring response times, intelligence scores).
Physical sciences (e.g., recording temperature changes, measuring speed).

For example, a psychologist studying reaction time may measure how long (in milliseconds) it takes for participants to respond to a stimulus.

Feature	Categorical Variables	Continuous Variables
Definition	Represent characteristics or categories	Represent numerical values measured on a scale
Types	Nominal and Ordinal	Interval and Ratio
Measurement	Cannot be measured numerically	Can be measured with precision
Example Values	Gender (Male, Female), Education Level (Primary, Secondary)	Height (175 cm), Temperature (37.5°C)
Arithmetic Operations	Cannot perform mathematical operations	Can perform mathematical operations (addition, multiplication, etc.)
Graphical Representation	Bar charts, Pie charts	Histograms, Line graphs

For example, hair color is a categorical variable, while body weight is a continuous variable because it can take precise values like 65.8 kg.

Variables are an essential part of research, influencing study design, data collection, and statistical analysis. Understanding categorical variables (which describe qualities or groups) and continuous variables (which measure numerical values) helps researchers make informed decisions when analyzing data.

While categorical variables help in grouping and classifying data, continuous variables allow precise measurements and mathematical operations. Both types play a crucial role in scientific research, helping to draw meaningful conclusions and advance knowledge across disciplines.

By correctly identifying and categorizing variables, researchers can enhance the accuracy, reliability, and validity of their findings, leading to more effective studies and real-world applications.

UNIT-1 (1.3) Levels of Measurement- Nominal, Ordinal, Interval, and Ratio.

मापन (Measurement) किसी भी अनुसंधान और सांख्यिकीय विश्लेषण का एक मूलभूत पहलू है। डेटा को प्रभावी ढंग से एकत्र करने, विश्लेषण करने और उसकी व्याख्या करने के लिए शोधकर्ता इसे विभिन्न स्तरों पर वर्गीकृत करते हैं। स्टैनली स्मिथ स्टीवंस (Stanley Smith Stevens) ने 1946 में मापन के चार प्रमुख स्तरों की परिभाषा दी:

नाममात्रिक स्तर (Nominal Level)
क्रमबद्ध स्तर (Ordinal Level)
अंतरिक स्तर (Interval Level)
अनुपात स्तर (Ratio Level)

प्रत्येक मापन स्तर यह निर्धारित करता है कि डेटा पर कौन से सांख्यिकीय संचालन किए जा सकते हैं और डेटा की व्याख्या कैसे की जानी चाहिए। इस निबंध में इन चार स्तरों का विस्तृत अध्ययन किया जाएगा, जिसमें उनकी विशेषताएँ, भिन्नताएँ और अनुप्रयोग शामिल होंगे।

नाममात्रिक मापन स्तर उन चरों को संदर्भित करता है जो डेटा को विभिन्न समूहों या श्रेणियों में वर्गीकृत करते हैं, लेकिन उनमें कोई विशेष क्रम (Ranking) नहीं होता।

गुणात्मक (Qualitative) डेटा – यह मात्रात्मक न होकर विशेषताओं को दर्शाता है।
कोई विशेष क्रम नहीं – श्रेणियों को किसी विशेष अनुक्रम में व्यवस्थित नहीं किया जा सकता।
एक-दूसरे से अलग श्रेणियाँ – प्रत्येक डेटा बिंदु केवल एक ही श्रेणी में आता है।
कोई गणितीय संचालन नहीं – इसमें जोड़, घटाव जैसे गणितीय संचालन संभव नहीं होते।

लिंग (Gender): पुरुष, महिला, अन्य
रक्त समूह (Blood Type): A, B, AB, O
राष्ट्रीयता (Nationality): भारतीय, अमेरिकी, चीनी, फ्रांसीसी
राजनीतिक दल (Political Party): भाजपा, कांग्रेस, आप
वैवाहिक स्थिति (Marital Status): अविवाहित, विवाहित, तलाकशुदा

मोड (Mode) – सबसे अधिक बार आने वाली श्रेणी को पहचानना।
ची-स्क्वायर परीक्षण (Chi-square test) – विभिन्न नाममात्रिक चरों के बीच संबंध निर्धारित करने के लिए।

बाजार अनुसंधान – ग्राहकों की प्राथमिकताओं का वर्गीकरण।
स्वास्थ्य अनुसंधान – बीमारियों के प्रकारों का वर्गीकरण।
सामाजिक अध्ययन – जातीयता और सांस्कृतिक पृष्ठभूमि का विश्लेषण।

उदाहरण के लिए, यदि कोई शोधकर्ता राजनीतिक प्राथमिकताओं का अध्ययन कर रहा है और उत्तरदाताओं को भाजपा, कांग्रेस, आप जैसे समूहों में वर्गीकृत करता है, तो इन श्रेणियों में कोई स्वाभाविक क्रम नहीं होगा।

क्रमबद्ध स्तर वह डेटा मापता है जिसे किसी तार्किक अनुक्रम में व्यवस्थित किया जा सकता है, लेकिन श्रेणियों के बीच का अंतर समान नहीं होता।

गुणात्मक या मात्रात्मक डेटा – इसमें श्रेणियाँ और संख्याएँ दोनों हो सकते हैं।
तार्किक क्रम (Order) होता है – डेटा को उच्च या निम्न क्रम में व्यवस्थित किया जा सकता है।
असमान अंतराल (Unequal Intervals) – श्रेणियों के बीच का अंतर एक समान नहीं होता।
सीमित गणितीय संचालन – केवल तुलना (अधिक या कम) की जा सकती है।

शिक्षा स्तर (Education Level): प्राथमिक, माध्यमिक, स्नातक, परास्नातक
सामाजिक-आर्थिक स्थिति (Socioeconomic Status): निम्न, मध्यम, उच्च
ग्राहक संतोष (Customer Satisfaction): बहुत असंतुष्ट, असंतुष्ट, संतुष्ट, बहुत संतुष्ट
दर्द की तीव्रता (Pain Severity): हल्का, मध्यम, तीव्र

माध्यिका (Median) और मोड (Mode) – लेकिन माध्य (Mean) का उपयोग नहीं किया जाता।
स्पीयरमैन रैंक सहसंबंध (Spearman’s Rank Correlation) – दो क्रमबद्ध चरों के बीच संबंध मापने के लिए।

सर्वेक्षण अनुसंधान – ग्राहक संतुष्टि का आकलन करना।
चिकित्सा अध्ययन – दर्द के स्तर को रैंक करना।
शिक्षा अनुसंधान – ग्रेडिंग प्रणाली (A, B, C, D)।

उदाहरण के लिए, होटल रेटिंग में 5-स्टार होटल को 3-स्टार होटल से बेहतर माना जाता है, लेकिन 3-स्टार और 4-स्टार के बीच की गुणवत्ता में अंतर समान नहीं हो सकता।

अंतरिक स्तर में डेटा को मापा जाता है और उसके मानों के बीच समान अंतर होता है, लेकिन इसमें एक वास्तविक शून्य बिंदु नहीं होता।

मात्रात्मक डेटा – केवल संख्यात्मक मान होते हैं।
तार्किक क्रम – डेटा एक अनुक्रम में व्यवस्थित होता है।
समान अंतराल – संख्याओं के बीच समान अंतर होता है।
कोई वास्तविक शून्य नहीं – शून्य बिंदु का कोई वास्तविक अर्थ नहीं होता।

तापमान (Temperature): 0°C का अर्थ “कोई तापमान नहीं” नहीं होता।
आईक्यू स्कोर (IQ Scores): 0 IQ का अर्थ “बुद्धिमत्ता का पूर्ण अभाव” नहीं होता।

औसत (Mean), माध्यिका (Median), और मोड (Mode) की गणना की जा सकती है।

मनोवैज्ञानिक अध्ययन – आईक्यू परीक्षण।
शैक्षिक अनुसंधान – परीक्षा स्कोर।

अनुपात स्तर अंतरिक स्तर के समान होता है, लेकिन इसमें एक वास्तविक शून्य बिंदु होता है।

मात्रात्मक डेटा – केवल संख्यात्मक मान।
समान अंतराल – माप के बीच समान अंतर होता है।
सच्चा शून्य बिंदु – 0 का अर्थ पूर्ण अनुपस्थिति होता है।

ऊंचाई (Height): 0 सेमी का अर्थ “कोई ऊंचाई नहीं”।
वजन (Weight): 0 किग्रा का अर्थ “कोई वजन नहीं”।
समय (Time): 0 सेकंड का अर्थ “कोई समय नहीं”।

सभी गणितीय संचालन – जोड़, घटाव, गुणा, भाग।

चिकित्सा अनुसंधान – रक्तचाप, हृदय गति।
अर्थशास्त्र और व्यवसाय – आय, मुनाफा।

चार मापन स्तरों को समझना आवश्यक है:

नाममात्रिक स्तर – केवल वर्गीकरण।
क्रमबद्ध स्तर – रैंकिंग संभव लेकिन अंतराल समान नहीं।
अंतरिक स्तर – समान अंतराल लेकिन वास्तविक शून्य नहीं।
अनुपात स्तर – समान अंतराल और वास्तविक शून्य, जिससे सभी गणितीय संचालन संभव होते हैं।

सही मापन स्तर चुनने से अनुसंधान की सटीकता और विश्वसनीयता बढ़ती है, जिससे अधिक प्रभावी डेटा विश्लेषण संभव होता है।

UNIT-1 (1.3) Levels of Measurement- Nominal, Ordinal, Interval, and Ratio.

Measurement is a fundamental aspect of research and statistical analysis. In order to collect, analyze, and interpret data effectively, researchers categorize variables based on the level of measurement they represent. The concept of levels of measurement was introduced by Stanley Smith Stevens in 1946 and includes four main types:

Nominal Level
Ordinal Level
Interval Level
Ratio Level

Each level of measurement determines the type of statistical operations that can be performed and how data can be interpreted. This essay explores these levels in detail, highlighting their characteristics, differences, and applications in various fields.

The nominal level of measurement refers to variables that categorize data without any inherent order or ranking. These variables represent labels or names that classify data into distinct groups.

Qualitative (categorical) data – Represents attributes rather than numerical values.
No meaningful order – Categories cannot be ranked in a logical sequence.
Mutually exclusive – Each data point belongs to only one category.
No mathematical operations – Arithmetic calculations (e.g., addition, subtraction) are not meaningful.

Gender: Male, Female, Other
Blood Type: A, B, AB, O
Nationality: American, Indian, Chinese, French
Political Affiliation: Democrat, Republican, Independent
Marital Status: Single, Married, Divorced

Mode (most frequently occurring category).
Chi-square test (to assess relationships between nominal variables).

Market research (classifying consumer preferences).
Healthcare studies (categorizing disease types).
Sociological studies (analyzing ethnicity and cultural backgrounds).

For example, a researcher studying political preferences may categorize respondents as Democrat, Republican, or Independent, but these groups have no inherent order.

The ordinal level of measurement represents data that can be categorized and ranked in a meaningful order, but the intervals between categories are not necessarily equal.

Qualitative or quantitative data – Can include both categories and numbers.
Meaningful order – Data can be ranked (e.g., high to low, best to worst).
Unequal intervals – Differences between ranks are not consistent.
Limited mathematical operations – Only comparisons (greater than, less than) are valid.

Education Level: Primary, Secondary, College, Postgraduate
Socioeconomic Status: Low, Middle, High
Customer Satisfaction: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied
Pain Severity: Mild, Moderate, Severe

Median and mode (but not mean).
Spearman’s rank correlation (to measure relationships).
Mann-Whitney U test (to compare two ordinal groups).

Survey research (assessing customer satisfaction).
Medical studies (ranking pain levels).
Educational research (grading performance as A, B, C, D).

For example, in a hotel rating system, a 5-star hotel is ranked higher than a 3-star hotel, but the difference in quality between 3-star and 4-star may not be the same as between 4-star and 5-star.

The interval level of measurement represents data that is ordered and has equal intervals between values but does not have a true zero point.

Quantitative data – Only numerical values are used.
Meaningful order – Values follow a logical sequence.
Equal intervals – Differences between values are consistent.
No true zero – Zero does not mean the absence of a quantity.

Temperature in Celsius or Fahrenheit (0°C does not mean “no temperature”).
IQ Scores (0 IQ does not mean “no intelligence”).
SAT Scores (a score of 0 does not mean a total lack of ability).

Mean, median, and mode can be calculated.
Standard deviation and variance can be measured.
t-tests and ANOVA can be used for hypothesis testing.

Psychological studies (measuring IQ).
Education research (exam scores).
Climate studies (temperature trends).

For example, in an IQ test, a score of 120 is higher than a score of 100, and the difference between 100 and 110 is the same as between 110 and 120, but a score of 0 does not mean “no intelligence.”

The ratio level of measurement includes all the properties of interval measurement but also has a true zero point, meaning zero represents the complete absence of the measured variable.

Quantitative data – Numeric values only.
Meaningful order – Values can be ranked logically.
Equal intervals – Differences between values are consistent.
True zero point – Zero means “nothing” or “absence” of the quantity.
All mathematical operations (addition, subtraction, multiplication, division) can be performed.

Height (cm, inches) – 0 cm means no height.
Weight (kg, pounds) – 0 kg means no weight.
Income ($, ₹) – 0 dollars means no income.
Time (seconds, minutes) – 0 seconds means no time.

All statistical methods used in interval data.
Geometric mean and coefficient of variation can be applied.
Regression analysis and ratio comparisons are possible.

Medical research (measuring blood pressure, cholesterol levels).
Sports science (tracking running speeds, heart rates).
Economics and business (analyzing revenue and profits).

For example, in weight measurement, 80 kg is twice as heavy as 40 kg, and 0 kg means “no weight,” making it a ratio variable.

Feature	Nominal	Ordinal	Interval	Ratio
Data Type	Categorical	Categorical/Quantitative	Quantitative	Quantitative
Order of Values	No	Yes	Yes	Yes
Equal Intervals	No	No	Yes	Yes
True Zero Point	No	No	No	Yes
Examples	Gender, Blood Type	Education Level, Satisfaction Rating	IQ Score, Temperature	Height, Weight, Income
Mathematical Operations	Counting	Ranking	Addition, Subtraction	All operations

Understanding the four levels of measurement—nominal, ordinal, interval, and ratio—is essential for researchers to determine the appropriate statistical techniques and data analysis methods.

Nominal data classifies without order.
Ordinal data ranks with no equal intervals.
Interval data has equal intervals but no true zero.
Ratio data has equal intervals and a true zero, allowing for all mathematical calculations.

Choosing the correct level of measurement ensures accurate data interpretation, meaningful comparisons, and reliable statistical analysis across various fields like psychology, business, medicine, and social sciences.

UNIT-2 (2.1) Basic concept of Descriptive and Inferential statistics.

सांख्यिकी (Statistics) गणित की एक शाखा है, जो डेटा के संग्रहण, विश्लेषण, व्याख्या और प्रस्तुति से संबंधित है। यह व्यवसाय, स्वास्थ्य, मनोविज्ञान, सामाजिक विज्ञान और इंजीनियरिंग सहित विभिन्न क्षेत्रों में महत्वपूर्ण भूमिका निभाती है। सांख्यिकी को मुख्य रूप से दो भागों में विभाजित किया जाता है:

वर्णनात्मक सांख्यिकी (Descriptive Statistics) – यह डेटा को सारांशित और व्यवस्थित करने का कार्य करती है।
अनुमानात्मक सांख्यिकी (Inferential Statistics) – यह एक छोटे नमूने (Sample) से पूरे जनसंख्या (Population) के बारे में निष्कर्ष निकालने में मदद करती है।

दोनों प्रकार के सांख्यिकी डेटा विश्लेषण में अलग-अलग कार्य करते हैं, लेकिन वे आपस में जुड़े हुए हैं। इस निबंध में इनकी मूल अवधारणाओं, अंतर, तकनीकों और अनुप्रयोगों का विस्तार से अध्ययन किया जाएगा।

वर्णनात्मक सांख्यिकी वे विधियाँ होती हैं, जो किसी डेटा को सारांशित (Summarize) और व्यवस्थित (Organize) करके उसे समझने योग्य बनाती हैं। यह डेटा का केवल वर्णन करती है और उसके आधार पर कोई निष्कर्ष या पूर्वानुमान नहीं लगाती।

बड़े डेटा सेट को सारांशित करती है तालिकाओं, ग्राफ़ और संख्यात्मक मापों के माध्यम से।
कोई निष्कर्ष या पूर्वानुमान नहीं लगाती, केवल डेटा का वर्णन करती है।
पैटर्न और संबंध खोजने के लिए उपयोग की जाती है।

ये माप डेटा के केंद्र को दर्शाते हैं:

माध्य (Mean): सभी मूल्यों का योग, कुल मूल्यों की संख्या से विभाजित।
- उदाहरण: किसी कक्षा के छात्रों की औसत ऊँचाई।
माध्यिका (Median): डेटा को बढ़ते क्रम में व्यवस्थित करने के बाद मध्य में स्थित मान।
- उदाहरण: एक समूह की औसत आय।
बहुलक (Mode): सबसे अधिक बार आने वाला मान।
- उदाहरण: एक परीक्षा में सबसे ज्यादा स्कोर किया गया अंक।

ये माप यह दर्शाते हैं कि डेटा कितना फैला हुआ है।

परास (Range): सबसे बड़ा मान – सबसे छोटा मान।
- उदाहरण: यदि परीक्षा में उच्चतम अंक 95 और न्यूनतम 55 हैं, तो परास 40 होगा।
विचलन (Variance): माध्य से डेटा की भिन्नता को दर्शाता है।
मानक विचलन (Standard Deviation): विचलन का वर्गमूल, डेटा की फैलावट को मापने के लिए।
- उदाहरण: यदि दो कक्षाओं में औसत अंक समान हैं, लेकिन एक में मानक विचलन अधिक है, तो उस कक्षा के अंकों में अधिक विविधता होगी।

डेटा को ग्राफ़ के रूप में प्रस्तुत करने से उसे समझना आसान हो जाता है।

हिस्टोग्राम (Histogram): डेटा की आवृत्ति (Frequency) को दर्शाता है।
बार ग्राफ़ (Bar Graph): विभिन्न श्रेणियों की तुलना करता है।
पाई चार्ट (Pie Chart): विभिन्न श्रेणियों का अनुपात दर्शाता है।
बॉक्स प्लॉट (Box Plot): माध्य, क्वार्टाइल और बाह्य मूल्यों (Outliers) को दिखाता है।

शिक्षा: छात्रों के परीक्षा परिणामों का विश्लेषण।
व्यवसाय: ग्राहकों की प्राथमिकताओं को समझना।
स्वास्थ्य: मरीजों के स्वास्थ्य रिकॉर्ड का सारांश बनाना।
खेल: खिलाड़ियों की प्रदर्शन सांख्यिकी की तुलना करना।

उदाहरण के लिए, एक कंपनी अलग-अलग शहरों में मासिक बिक्री के औसत की गणना करने के लिए वर्णनात्मक सांख्यिकी का उपयोग कर सकती है।

अनुमानात्मक सांख्यिकी वह विधियाँ होती हैं, जो एक छोटे नमूने (Sample) के आधार पर पूरी जनसंख्या (Population) के बारे में निष्कर्ष निकालती हैं।

नमूने के आधार पर जनसंख्या का पूर्वानुमान करती है।
संभाव्यता सिद्धांत (Probability Theory) पर आधारित होती है।
परिकल्पना परीक्षण (Hypothesis Testing) और विश्वास अंतराल (Confidence Intervals) का उपयोग करती है।

जनसंख्या (Population): संपूर्ण समूह जिसका अध्ययन किया जा रहा है।
नमूना (Sample): जनसंख्या का एक छोटा भाग, जिस पर अध्ययन किया जाता है।
यादृच्छिक नमूकरण (Random Sampling): सुनिश्चित करता है कि प्रत्येक व्यक्ति को चुने जाने का समान अवसर मिले।

यह परीक्षण यह निर्धारित करने में मदद करता है कि कोई धारणा सही है या नहीं।

शून्य परिकल्पना (H₀): कहती है कि कोई महत्वपूर्ण अंतर या प्रभाव नहीं है।
वैकल्पिक परिकल्पना (H₁): कहती है कि कोई महत्वपूर्ण अंतर या प्रभाव है।

उदाहरण:
यदि एक कंपनी दावा करती है कि उसका नया आहार पूरक 1 महीने में 5 किलो वजन कम करता है, तो इस दावे की वैधता परिकल्पना परीक्षण द्वारा सत्यापित की जा सकती है।

यह अनुमान लगाता है कि किसी आबादी का मापित मूल्य किसी सीमा के भीतर कितना सटीक है।

उदाहरण:
यदि एक सर्वेक्षण से पता चलता है कि 60% लोग किसी नेता का समर्थन करते हैं और विश्वास अंतराल ±3% है, तो सही समर्थन स्तर 57% से 63% के बीच होगा।

सहसंबंध (Correlation): दो चरों के बीच संबंध को मापता है।
प्रतिगमन (Regression): एक चर के आधार पर दूसरे का अनुमान लगाता है।

उदाहरण:
एक अध्ययन में पाया गया कि अध्ययन के घंटे और परीक्षा स्कोर के बीच सकारात्मक सहसंबंध (r = 0.85) है, यानी अधिक अध्ययन करने वाले छात्रों के अंक अधिक होते हैं।

t-परीक्षण (t-Test): दो समूहों की औसत तुलना करता है।
ANOVA: तीन या अधिक समूहों की औसत तुलना करता है।

उदाहरण:
t-परीक्षण का उपयोग यह जांचने के लिए किया जा सकता है कि पुरुष और महिला कर्मचारियों के वेतन में कोई महत्वपूर्ण अंतर है या नहीं।

चिकित्सा: किसी नई दवा की प्रभावशीलता जांचना।
अर्थशास्त्र: भविष्य में मुद्रास्फीति दर की भविष्यवाणी करना।
विपणन: उपभोक्ता व्यवहार का विश्लेषण करना।
राजनीति: चुनाव परिणामों का पूर्वानुमान लगाना।

विशेषता	वर्णनात्मक सांख्यिकी	अनुमानात्मक सांख्यिकी
उद्देश्य	डेटा का सारांश देना	निष्कर्ष निकालना
डेटा उपयोग	पूरे डेटा का उपयोग	नमूने पर आधारित
तकनीकें	माध्य, माध्यिका, बहुलक	परिकल्पना परीक्षण, प्रतिगमन

वर्णनात्मक और अनुमानात्मक सांख्यिकी डेटा विश्लेषण के दो आवश्यक भाग हैं। वर्णनात्मक सांख्यिकी डेटा को सारांशित करती है, जबकि अनुमानात्मक सांख्यिकी पूर्वानुमान और निष्कर्ष निकालने

में मदद करती है। दोनों का उपयोग विज्ञान, व्यापार, चिकित्सा और अन्य क्षेत्रों में किया जाता है, जिससे डेटा-संचालित निर्णय लेने में सहायता मिलती है।

UNIT-2 (2.1) Basic concept of Descriptive and Inferential statistics.

Statistics is a branch of mathematics that deals with collecting, analyzing, interpreting, and presenting data. It plays a crucial role in various fields, including business, healthcare, psychology, social sciences, and engineering. Broadly, statistics is divided into two major types:

Descriptive Statistics – It deals with summarizing and organizing data.
Inferential Statistics – It involves making predictions and generalizations from a sample to a population.

Both types serve different purposes but are interconnected in data analysis. This essay explores their fundamental concepts, differences, techniques, and applications.

Descriptive statistics refers to the methods used to summarize and organize data in a meaningful way. It provides simple summaries and graphical representations of data but does not allow for generalizations beyond the specific dataset.

Summarizes large data sets using tables, graphs, and numerical measures.
No conclusions or predictions about the population, only describes the data.
Used in raw data analysis to find patterns and relationships.

Descriptive statistics can be divided into three main categories:

These measures represent the center or typical value of a dataset.

Mean (Arithmetic Average): Sum of all values divided by the total number of observations.
- Example: The average height of students in a class.
Median: The middle value when data is arranged in ascending order.
- Example: The median income of a group of people.
Mode: The most frequently occurring value in a dataset.
- Example: The most common exam score in a classroom.

These measures show how much the data varies or spreads out.

Range: The difference between the highest and lowest value.
- Example: If the highest test score is 95 and the lowest is 55, the range is 40.
Variance: The average squared difference from the mean, showing how spread out the data is.
Standard Deviation: The square root of variance, used to measure data dispersion.
- Example: If two groups have the same average score but different standard deviations, the group with a higher standard deviation has more variation.

Visual representation makes data easier to understand.

Histograms: Used for frequency distribution.
Bar Graphs: Used for categorical data comparisons.
Pie Charts: Show proportions of categories.
Box Plots: Show median, quartiles, and outliers.

Education: Analyzing student performance.
Business: Understanding customer preferences.
Healthcare: Summarizing patient health records.
Sports: Comparing player statistics.

For example, a company may use descriptive statistics to determine the average monthly sales in different regions.

Inferential statistics involves making predictions or inferences about a population based on a sample. It helps researchers determine the probability that their conclusions apply to a larger group.

Uses sample data to make predictions about a population.
Involves probability theory to determine the reliability of results.
Can be used for hypothesis testing and confidence intervals.

Inferential statistics primarily includes hypothesis testing and estimation techniques to draw conclusions.

Population: The entire group being studied.
Sample: A smaller subset of the population used for analysis.
Random Sampling: A method to ensure every individual has an equal chance of selection.

For example, to estimate the average height of all university students, a researcher might measure a sample of 500 students.

Hypothesis testing determines whether an assumption (hypothesis) about a population is true.

Null Hypothesis (H₀): States there is no significant difference or effect.
Alternative Hypothesis (H₁): Suggests a significant difference or effect exists.

Example:
A company claims that their new diet pill helps people lose 5 kg in a month. A study with 100 participants is conducted to test whether this claim is statistically significant.

A confidence interval estimates the range in which a population parameter (e.g., mean) is likely to fall.

Example:
A survey finds that 60% of voters support a candidate, with a 95% confidence interval of ±3%. This means the true support level is likely between 57% and 63%.

Correlation: Measures the relationship between two variables (e.g., height and weight).
Regression: Predicts the value of one variable based on another.

Example:
A study finds a positive correlation (r = 0.85) between hours studied and exam scores, meaning students who study more tend to score higher.

t-Test: Compares the means of two groups.
ANOVA: Compares the means of three or more groups.

Example:
A t-test could compare the average salary of male and female employees to determine if there is a significant difference.

Medicine: Determining if a new drug is effective.
Economics: Predicting future inflation rates.
Marketing: Analyzing customer behavior trends.
Political Science: Predicting election results.

For instance, political analysts use inferential statistics to predict election outcomes based on pre-election surveys.

Feature	Descriptive Statistics	Inferential Statistics
Purpose	Summarizes and describes data	Makes predictions and generalizations
Data Usage	Uses the entire dataset	Uses a sample to infer about a population
Techniques	Measures of central tendency, dispersion, graphs	Hypothesis testing, confidence intervals, regression
Example	Average test score of students in one school	Predicting national student performance based on a sample

For example, calculating the average salary of employees in a company is descriptive statistics, but using a sample to predict the national average salary is inferential statistics.

Both descriptive and inferential statistics are essential for decision-making and research:

In Science and Research: Helps in analyzing experiments and drawing conclusions.
In Business and Marketing: Assists in understanding market trends and customer behavior.
In Healthcare: Used for clinical trials and medical research.
In Education: Helps evaluate student performance and teaching methods.
In Government and Policy Making: Guides policy decisions and economic planning.

For example, the COVID-19 pandemic saw extensive use of descriptive statistics (tracking daily cases) and inferential statistics (predicting future infection rates).

Descriptive and inferential statistics are two fundamental branches of statistical analysis. Descriptive statistics helps summarize data, while inferential statistics allows us to make predictions and draw conclusions. Both are widely used in various fields, from research and medicine to business and social sciences. Understanding these concepts enables researchers and professionals to make informed, data-driven decisions.

By applying the right statistical methods, we can gain valuable insights from data, ultimately leading to better planning, innovation, and problem-solving in diverse fields.

UNIT-2 (2.2) Frequency distribution of data and Graphic presentation: Histogram, Polygon and Ogive.

सांख्यिकी में, डेटा को अक्सर बड़ी मात्रा में एकत्र किया जाता है, जिससे इसे सीधे समझना और विश्लेषण करना मुश्किल हो सकता है। इसलिए, डेटा को आवृत्ति वितरण (Frequency Distribution) में व्यवस्थित किया जाता है और उसे विभिन्न ग्राफ़िक विधियों (Graphical Methods) जैसे हिस्टोग्राम (Histogram), आवृत्ति बहुभुज (Frequency Polygon), और ओगिव (Ogive) द्वारा प्रदर्शित किया जाता है।

ये ग्राफ़ डेटा की प्रवृत्ति, वितरण और पैटर्न को आसानी से समझने में मदद करते हैं। इस निबंध में हम आवृत्ति वितरण की अवधारणा, इसके प्रकार, निर्माण की विधियाँ, और इसके ग्राफ़िक प्रतिनिधित्व को विस्तार से समझेंगे।

आवृत्ति वितरण एक सांख्यिकीय तकनीक है, जिसमें डेटा को अलग-अलग वर्गों (Classes) या समूहों (Groups) में विभाजित किया जाता है और हर वर्ग में आने वाले मानों की संख्या (Frequency) को दर्ज किया जाता है। यह बड़ी मात्रा में डेटा को संक्षेप में प्रस्तुत करने में मदद करता है।

कच्चे (Raw) डेटा को एक व्यवस्थित तालिका में बदलता है।
यह दिखाता है कि डेटा के विभिन्न मान कितनी बार दोहराए गए हैं।
डेटा पैटर्न और रुझानों को समझने में मदद करता है।
विभिन्न ग्राफ़िक विधियों द्वारा इसे प्रदर्शित किया जा सकता है।

जब डेटा के मान कम होते हैं, तो उन्हें व्यक्तिगत रूप से सूचीबद्ध किया जाता है।

उदाहरण:
10 छात्रों के परीक्षा अंकों का डेटा:
{85, 90, 78, 85, 88, 92, 78, 85, 90, 88}

इसका आवृत्ति वितरण तालिका:

अंक (x)	आवृत्ति (f)
78	2
85	3
88	2
90	2
92	1

जब डेटा बड़ा होता है, तो इसे कुछ वर्गों (Class Intervals) में विभाजित किया जाता है।

उदाहरण:
अगर किसी कक्षा में छात्रों के अंक 40 से 100 के बीच हैं, तो उन्हें 10 के अंतराल (Class Width) में विभाजित किया जा सकता है।

वर्ग अंतराल (Class Interval)	आवृत्ति (f)
40 – 49	3
50 – 59	5
60 – 69	8
70 – 79	10
80 – 89	7
90 – 99	4

डेटा एकत्र करना – अध्ययन के लिए आवश्यक डेटा प्राप्त करें।
सीमा (Range) ज्ञात करें – अधिकतम और न्यूनतम मानों के बीच का अंतर निकालें।
वर्गों की संख्या तय करें – आमतौर पर 5 से 10 वर्गों का चयन किया जाता है।
वर्ग चौड़ाई तय करें – सीमा को वर्गों की संख्या से विभाजित करें।
वर्ग अंतराल बनाएँ – प्रत्येक वर्ग को एक निश्चित श्रेणी में रखें।
प्रत्येक वर्ग की आवृत्ति गिनें – यह निर्धारित करें कि प्रत्येक वर्ग में कितने डेटा अंक आते हैं।

हिस्टोग्राम एक प्रकार का बार ग्राफ़ (Bar Graph) है, जो किसी डेटा के आवृत्ति वितरण को दर्शाता है। इसमें बार्स (Bars) जुड़े हुए होते हैं, जिससे यह दर्शाया जाता है कि डेटा सतत (Continuous) है।

x-अक्ष (Horizontal Axis) पर वर्ग अंतराल होते हैं।
y-अक्ष (Vertical Axis) पर आवृत्ति होती है।
बार्स के बीच कोई अंतर (Gap) नहीं होता।

x-अक्ष पर वर्ग अंतराल चिह्नित करें।
y-अक्ष पर प्रत्येक वर्ग की आवृत्ति को दर्शाएँ।
प्रत्येक वर्ग के लिए आयत (Rectangles) बनाएँ, जिनकी ऊँचाई आवृत्ति के बराबर हो।

यह डेटा वितरण को दर्शाने में मदद करता है।
यह सामान्य (Normal), असामान्य (Skewed) या द्विक पर्वतीय (Bimodal) वितरण को दिखाता है।
यह व्यवसाय, विज्ञान और अनुसंधान में व्यापक रूप से प्रयोग किया जाता है।

आवृत्ति बहुभुज (Frequency Polygon) एक रेखा ग्राफ़ (Line Graph) होता है, जो वर्ग अंतरालों के मध्य बिंदुओं (Midpoints) को आवृत्ति के अनुसार जोड़कर बनाया जाता है।

x-अक्ष पर वर्ग मध्यबिंदु होते हैं।
y-अक्ष पर आवृत्तियाँ होती हैं।
बिंदुओं को रेखाओं द्वारा जोड़ा जाता है।

प्रत्येक वर्ग का मध्यबिंदु (Midpoint) निकालें: मध्यबिंदु=निम्न सीमा+उच्च सीमा2\text{मध्यबिंदु} = \frac{\text{निम्न सीमा} + \text{उच्च सीमा}}{2}
प्रत्येक मध्यबिंदु के लिए आवृत्ति बिंदु (Points) बनाएँ।
बिंदुओं को रेखा द्वारा जोड़ें।
बहुभुज को दोनों सिरों पर x-अक्ष से मिलाएँ।

यह डेटा वितरण को स्पष्ट रूप से दर्शाता है।
यह विभिन्न डेटा समूहों की तुलना करने के लिए उपयोगी होता है।
यह समय-श्रृंखला (Time-Series) डेटा में बदलाव को दिखाने में मदद करता है।

ओगिव (Ogive) एक वक्र (Curve) होता है, जो संचयी आवृत्तियों (Cumulative Frequencies) को प्रदर्शित करता है।

Less than Ogive: इसमें उन मूल्यों की कुल संख्या होती है जो किसी वर्ग सीमा से कम होते हैं।
More than Ogive: इसमें उन मूल्यों की कुल संख्या होती है जो किसी वर्ग सीमा से अधिक होते हैं।

संचयी आवृत्ति तालिका बनाएँ।
वर्ग सीमाओं के खिलाफ संचयी आवृत्तियों को प्लॉट करें।
एक चिकनी वक्र (Smooth Curve) बनाएँ।

यह माध्यिका (Median) और प्रतिशतक (Percentiles) को निर्धारित करने में मदद करता है।
यह डेटा संचय (Cumulative Growth) को दिखाता है।

आवृत्ति वितरण डेटा को व्यवस्थित करने का एक महत्वपूर्ण तरीका है, जिससे डेटा को आसानी से समझा और विश्लेषण किया जा सकता है। हिस्टोग्राम, आवृत्ति बहुभुज और ओगिव जैसे ग्राफ़िक उपकरण डेटा वितरण के पैटर्न को स्पष्ट रूप से प्रदर्शित करते हैं। इन तकनीकों का उपयोग विभिन्न क्षेत्रों जैसे शिक्षा, व्यापार, चिकित्सा, और अनुसंधान में किया जाता है ताकि निर्णय लेने में सहायता मिल सके।

UNIT-2 (2.2) Frequency distribution of data and Graphic presentation: Histogram, Polygon and Ogive.

In statistics, data is often collected in large quantities, making it difficult to analyze or interpret in its raw form. To make sense of data, statisticians organize it into a structured form known as a frequency distribution and represent it visually using different graphs, including histograms, frequency polygons, and ogives. These graphical representations help in understanding patterns, trends, and distributions of data effectively.

This essay explores the concept of frequency distribution, its types, methods of construction, and its graphical representations, specifically histogram, frequency polygon, and ogive.

A frequency distribution is a systematic way of arranging data into classes or groups along with their corresponding frequencies (the number of times a particular value or group of values appears). It helps in summarizing large datasets for easier analysis.

Organizes raw data into a structured format.
Displays the frequency (count) of values in each group.
Helps identify trends and patterns in data.
Useful for statistical analysis and graphical representation.

Ungrouped Frequency Distribution

Used when data values are few and can be listed individually.
Example: Test scores of 10 students – {85, 90, 78, 85, 88, 92, 78, 85, 90, 88}.
Frequency table:

Score (x)	Frequency (f)
78	2
85	3
88	2
90	2
92	1

Grouped Frequency Distribution

Used for large datasets by dividing data into class intervals.
Example: Student test scores ranging from 40 to 100, grouped in class intervals of 10.

Class Interval	Frequency (f)
40 – 49	3
50 – 59	5
60 – 69	8
70 – 79	10
80 – 89	7
90 – 99	4

Collect the data – Gather the dataset to be analyzed.
Determine the range – Find the difference between the highest and lowest values.
Select the number of class intervals – Typically between 5 and 10 for readability.
Determine class width – Divide the range by the number of intervals.
Create class intervals – Ensure they are mutually exclusive and exhaustive.
Count the frequency – Record how many values fall into each interval.

Graphical representation makes it easier to visualize data patterns. The three primary graphs for frequency distributions are histograms, frequency polygons, and ogives.

A histogram is a bar graph that represents the frequency distribution of a dataset. Unlike bar charts, histograms have adjacent bars, showing that the data is continuous.

The x-axis (horizontal axis) represents the class intervals.
The y-axis (vertical axis) represents the frequency of occurrences.
Bars are adjacent, indicating continuous data.

Draw x-axis and label it with class intervals.
Draw y-axis and label it with frequencies.
Draw rectangular bars for each class interval, where the height represents the frequency.
Ensure there are no gaps between bars.

Class Interval	Frequency (f)
40 – 49	3
50 – 59	5
60 – 69	8
70 – 79	10
80 – 89	7
90 – 99	4

In the histogram, the bars for each class interval would have the following heights: 3, 5, 8, 10, 7, and 4.

Helps in understanding the distribution shape (normal, skewed, bimodal, etc.).
Useful in statistical analysis for large datasets.
Commonly used in quality control and research studies.

A frequency polygon is a line graph that connects the midpoints of the tops of histogram bars, providing a smoother representation of data distribution.

The x-axis represents class midpoints.
The y-axis represents frequencies.
A continuous line is drawn through plotted points.

Find the midpoints of each class interval: Midpoint=Lower Bound+Upper Bound2\text{Midpoint} = \frac{\text{Lower Bound} + \text{Upper Bound}}{2}
Plot the midpoints against the frequencies.
Connect the points using straight lines.
Extend the polygon to the x-axis at both ends for closure.

Class Interval	Midpoint	Frequency (f)
40 – 49	44.5	3
50 – 59	54.5	5
60 – 69	64.5	8
70 – 79	74.5	10
80 – 89	84.5	7
90 – 99	94.5	4

Shows overall trends in data distribution.
Easier to compare multiple distributions on the same graph.
Useful in understanding fluctuations over intervals.

An ogive is a graph that represents cumulative frequencies, showing how data accumulates over intervals. There are two types:

Less than ogive: Plots cumulative frequencies of values less than class limits.
More than ogive: Plots cumulative frequencies of values greater than class limits.

The x-axis represents class boundaries.
The y-axis represents cumulative frequencies.
A smooth curve is drawn through the plotted points.

Create a cumulative frequency table:

Class Interval	Frequency (f)	Cumulative Frequency (Less than)
40 – 49	3	3
50 – 59	5	3 + 5 = 8
60 – 69	8	8 + 8 = 16
70 – 79	10	16 + 10 = 26
80 – 89	7	26 + 7 = 33
90 – 99	4	33 + 4 = 37

Plot cumulative frequencies against the class boundaries.
Draw a smooth curve through the points.

Helps in determining median and percentiles.
Useful in comparing distributions.
Shows cumulative trends effectively.

Frequency distributions and their graphical representations—histograms, frequency polygons, and ogives—are essential tools in statistics. They provide insights into data distribution, trends, and patterns, making it easier to interpret and analyze large datasets. These methods are widely used in research, business, healthcare, and other fields to make data-driven decisions. Understanding how to construct and interpret these graphs is fundamental for statistical analysis and real-world applications.

UNIT-2(2.3) Measures of Central tendency: Calculation of Mean, Median and Mode.

सांख्यिकी (Statistics) में, केन्द्रीय प्रवृत्ति के माप (Measures of Central Tendency) का उपयोग डेटा के केंद्रीय या सामान्य मान को समझने के लिए किया जाता है। तीन प्रमुख केन्द्रीय प्रवृत्ति के माप होते हैं:

माध्य (Mean) – सभी मानों के योग को कुल मानों की संख्या से विभाजित करके प्राप्त किया जाता है।
माध्यिका (Median) – जब डेटा को क्रमबद्ध किया जाता है, तो यह मध्य मान होता है।
बहुलक (Mode) – डेटा में सबसे अधिक बार आने वाला मान।

ये तीनों माप डेटा के वितरण को अलग-अलग तरीकों से व्याख्या करने में मदद करते हैं। इस निबंध में हम माध्य, माध्यिका और बहुलक की परिभाषा, सूत्र और उनकी गणना को विस्तृत रूप से समझेंगे।

माध्य को आमतौर पर औसत (Average) कहा जाता है। यह सभी डेटा मानों के योग को कुल मानों की संख्या से विभाजित करके प्राप्त किया जाता है।

माध्य(Xˉ)=∑XN\text{माध्य} (\bar{X}) = \frac{\sum X}{N}

जहाँ,

∑X\sum X = सभी मानों का योग
NN = कुल मानों की संख्या

माध्य(Xˉ)=∑fX∑f\text{माध्य} (\bar{X}) = \frac{\sum fX}{\sum f}

जहाँ,

ff = प्रत्येक वर्ग की आवृत्ति
XX = प्रत्येक वर्ग का मध्य बिंदु
∑fX\sum fX = सभी वर्गों के मध्य बिंदु और उनकी आवृत्तियों के गुणनफल का योग
∑f\sum f = कुल आवृत्ति

मान लीजिए 5 छात्रों के अंक इस प्रकार हैं: 40, 50, 60, 70, 80

माध्य=(40+50+60+70+80)5=3005=60\text{माध्य} = \frac{(40 + 50 + 60 + 70 + 80)}{5} = \frac{300}{5} = 60

नीचे दिए गए डेटा को देखें:

वर्ग अंतराल (Class Interval)	आवृत्ति (f)	मध्य बिंदु (X)	fX
10 – 20	3	15	45
20 – 30	5	25	125
30 – 40	7	35	245
40 – 50	10	45	450
50 – 60	5	55	275
योग (Total)	30		1140

माध्य=114030=38\text{माध्य} = \frac{1140}{30} = 38

इस प्रकार, इस डेटा का माध्य 38 है।

माध्यिका एक ऐसा मान है जो डेटा को दो समान भागों में विभाजित करता है।

डेटा को आरोही क्रम (Ascending Order) में व्यवस्थित करें।
माध्यिका का स्थान निकालें: माध्यिका स्थान=N+12\text{माध्यिका स्थान} = \frac{N+1}{2} जहाँ NN कुल मानों की संख्या है।
यदि NN विषम है, तो माध्यिका सीधा मध्य मान होगा।
यदि NN सम है, तो माध्यिका दो मध्य मानों का औसत होगा।

माध्यिका=L+(N2−CFf)×h\text{माध्यिका} = L + \left( \frac{\frac{N}{2} – CF}{f} \right) \times h

जहाँ:

LL = माध्यिका वर्ग की निम्न सीमा
NN = कुल आवृत्ति
CFCF = माध्यिका वर्ग के पहले की संचयी आवृत्ति
ff = माध्यिका वर्ग की आवृत्ति
hh = वर्ग चौड़ाई

मान लीजिए, डेटा इस प्रकार है: 25, 30, 35, 40, 45, 50, 55

यहाँ N=7N = 7 (विषम संख्या), इसलिए:

माध्यिका स्थान=7+12=4\text{माध्यिका स्थान} = \frac{7+1}{2} = 4

इसलिए, 4वाँ मान = 40, अतः माध्यिका = 40।

यदि डेटा होता: 25, 30, 35, 40, 45, 50 (सम संख्या N=6N = 6),

माध्यिका=(35+40)2=37.5\text{माध्यिका} = \frac{(35 + 40)}{2} = 37.5

बहुलक वह मान होता है जो डेटा में सबसे अधिक बार आता है।

असमूहीकृत डेटा में सबसे अधिक बार आने वाला मान खोजें।
समूहीकृत डेटा में सबसे अधिक आवृत्ति वाले वर्ग (Modal Class) की पहचान करें।
निम्न सूत्र का प्रयोग करें:

बहुलक=L+(f1−f0(2f1−f0−f2))×h\text{बहुलक} = L + \left( \frac{f_1 – f_0}{(2f_1 – f_0 – f_2)} \right) \times h

जहाँ:

LL = बहुलक वर्ग की निम्न सीमा
f1f_1 = बहुलक वर्ग की आवृत्ति
f0f_0 = बहुलक वर्ग से पहले की आवृत्ति
f2f_2 = बहुलक वर्ग के बाद की आवृत्ति
hh = वर्ग चौड़ाई

यदि डेटा इस प्रकार है: 2, 3, 3, 5, 6, 3, 8, 9, 3

तो सबसे अधिक बार 3 आता है, अतः बहुलक = 3।

वर्ग अंतराल	आवृत्ति (f)
10 – 20	3
20 – 30	7
30 – 40	12
40 – 50	8
50 – 60	5

बहुलक=30+(12−7(2×12−7−8))×10\text{बहुलक} = 30 + \left( \frac{12 – 7}{(2 \times 12 – 7 – 8)} \right) \times 10 =30+5.56=35.56= 30 + 5.56 = 35.56

अतः बहुलक = 35.56।

माध्य, माध्यिका और बहुलक डेटा का केंद्रीय मान ज्ञात करने के लिए महत्वपूर्ण सांख्यिकीय माप हैं। माध्य औसत को दर्शाता है, माध्यिका मध्य मान को बताता है, और बहुलक सबसे अधिक बार आने वाले मान को इंगित करता है। ये उपाय अनुसंधान, व्यापार, अर्थशास्त्र, और विज्ञान में निर्णय लेने के लिए आवश्यक होते हैं।

UNIT-2(2.3) Measures of Central tendency: Calculation of Mean, Median and Mode.

In statistics, measures of central tendency are used to describe the central or typical value of a dataset. The three most commonly used measures are:

Mean (Average) – The sum of all values divided by the number of values.
Median – The middle value when data is arranged in order.
Mode – The most frequently occurring value(s) in the dataset.

Each measure provides a different perspective on data distribution and is useful in different scenarios. In this essay, we will explore the definitions, formulas, and step-by-step calculations of mean, median, and mode for both ungrouped and grouped data with examples.

The mean is the sum of all observations divided by the total number of observations. It is the most commonly used measure of central tendency.

Mean(Xˉ)=∑XN\text{Mean} (\bar{X}) = \frac{\sum X}{N}

where:

∑X\sum X = Sum of all values
NN = Total number of values

Mean(Xˉ)=∑fX∑f\text{Mean} (\bar{X}) = \frac{\sum fX}{\sum f}

where:

ff = Frequency of each class
XX = Midpoint of each class interval
∑fX\sum fX = Sum of the product of midpoints and frequencies
∑f\sum f = Total frequency

Consider the marks of 5 students: 40, 50, 60, 70, 80

Mean=(40+50+60+70+80)5=3005=60\text{Mean} = \frac{(40 + 50 + 60 + 70 + 80)}{5} = \frac{300}{5} = 60

Consider the following grouped frequency distribution:

Class Interval	Frequency (f)	Midpoint (X)	fX
10 – 20	3	15	45
20 – 30	5	25	125
30 – 40	7	35	245
40 – 50	10	45	450
50 – 60	5	55	275
Total	30		1140

Mean=114030=38\text{Mean} = \frac{1140}{30} = 38

Thus, the mean of this dataset is 38.

The median is the middle value of an ordered dataset. It divides the data into two equal halves.

Arrange the data in ascending order.
Find the position of the median using the formula: Median Position=N+12\text{Median Position} = \frac{N+1}{2} where NN is the total number of values.
If NN is odd, the median is the middle value.
If NN is even, the median is the average of the two middle values.

Median=L+(N2−CFf)×h\text{Median} = L + \left( \frac{\frac{N}{2} – CF}{f} \right) \times h

where:

LL = Lower boundary of the median class
NN = Total frequency
CFCF = Cumulative frequency before the median class
ff = Frequency of the median class
hh = Class width

Consider the dataset: 25, 30, 35, 40, 45, 50, 55

Total values: N=7N = 7 (odd), so the median is:

Median Position=7+12=4\text{Median Position} = \frac{7+1}{2} = 4

Thus, the 4th value is 40, so median = 40.

If the dataset were: 25, 30, 35, 40, 45, 50 (even N=6N = 6),

Median=(35+40)2=37.5\text{Median} = \frac{(35 + 40)}{2} = 37.5

Consider the dataset:

Class Interval	Frequency (f)	Cumulative Frequency (CF)
10 – 20	3	3
20 – 30	5	8
30 – 40	7	15
40 – 50	10	25
50 – 60	5	30

Total N=30N = 30, so N/2=15N/2 = 15. The median class is 30 – 40 (where CF reaches 15).

Median=30+(15−87)×10\text{Median} = 30 + \left( \frac{15 – 8}{7} \right) \times 10 =30+(77)×10=30+10=40= 30 + \left( \frac{7}{7} \right) \times 10 = 30 + 10 = 40

Thus, the median = 40.

The mode is the value that appears most frequently in a dataset. It is useful for categorical, discrete, and continuous data.

Identify the most frequently occurring value in ungrouped data.
In grouped data, find the modal class (class with the highest frequency).
Use the formula for grouped data:

Mode=L+(f1−f0(2f1−f0−f2))×h\text{Mode} = L + \left( \frac{f_1 – f_0}{(2f_1 – f_0 – f_2)} \right) \times h

where:

LL = Lower boundary of the modal class
f1f_1 = Frequency of the modal class
f0f_0 = Frequency of the class before the modal class
f2f_2 = Frequency of the class after the modal class
hh = Class width

Given the data: 2, 3, 3, 5, 6, 3, 8, 9, 3

Since 3 appears the most times, the mode = 3.

Class Interval	Frequency (f)
10 – 20	3
20 – 30	7
30 – 40	12
40 – 50	8
50 – 60	5

Using the formula:

Mode=30+(12−7(2×12−7−8))×10\text{Mode} = 30 + \left( \frac{12 – 7}{(2 \times 12 – 7 – 8)} \right) \times 10 =30+(5(24−15))×10= 30 + \left( \frac{5}{(24 – 15)} \right) \times 10 =30+(59)×10= 30 + \left( \frac{5}{9} \right) \times 10 =30+5.56=35.56= 30 + 5.56 = 35.56

Thus, mode = 35.56.

The three measures of central tendency—mean, median, and mode—provide different insights into a dataset. The mean is most affected by extreme values, while the median is more robust, and the mode is useful for identifying common values. These measures are widely used in research, economics, psychology, and business analytics for data analysis and decision-making. Understanding how to calculate and interpret them is essential for statistical studies.

UNIT-2(2.4) Measures of Variability: Calculation of Range, QD, AD, SD.

सांख्यिकी (Statistics) में, प्रसरण (Variability) यह दर्शाता है कि किसी डेटा सेट के मान कितने फैले हुए या एक-दूसरे से कितने भिन्न हैं। जबकि केन्द्रीय प्रवृत्ति के माप (Measures of Central Tendency) (जैसे माध्य, माध्यिका और बहुलक) डेटा के केंद्रीय मान को व्यक्त करते हैं, प्रसरण के माप डेटा की विविधता या फैलाव को दर्शाते हैं।

प्रसरण के चार प्रमुख माप निम्नलिखित हैं:

सीमा (Range) – अधिकतम और न्यूनतम मान के बीच का अंतर।
चतुर्थक विचलन (Quartile Deviation – QD) – मध्य 50% डेटा की प्रसरण सीमा।
माध्य परास विचलन (Mean Absolute Deviation – AD) – प्रत्येक डेटा बिंदु और माध्य के बीच औसत विचलन।
मानक विचलन (Standard Deviation – SD) – डेटा के माध्य से औसत दूरी का एक सटीक माप।

इस लेख में, हम इन चारों मापों की परिभाषा, सूत्र और गणना को विस्तार से समझेंगे।

सीमा (Range) प्रसरण का सबसे सरल माप है। यह डेटा सेट में अधिकतम और न्यूनतम मान के बीच के अंतर को दर्शाता है।

सीमा=अधिकतम मान−न्यूनतम मान\text{सीमा} = \text{अधिकतम मान} – \text{न्यूनतम मान}

मान लीजिए हमारे पास निम्नलिखित डेटा है: 5, 12, 20, 25, 30

सीमा=30−5=25\text{सीमा} = 30 – 5 = 25

नीचे दिया गया डेटा देखें:

वर्ग अंतराल	आवृत्ति (f)
10 – 20	5
20 – 30	10
30 – 40	7
40 – 50	8
50 – 60	5

सीमा=60−10=50\text{सीमा} = 60 – 10 = 50

यह केवल दो चरम मानों पर निर्भर करता है, जिससे यह आउटलायर्स (चरम मानों) से प्रभावित होता है।
यह डेटा के वितरण की पूरी जानकारी नहीं देता।

चतुर्थक विचलन (QD) यह दर्शाता है कि डेटा के मध्य 50% मान कितने फैले हुए हैं। यह तीसरे चतुर्थक (Q₃) और पहले चतुर्थक (Q₁) के बीच का अंतर होता है, जिसे अर्द्ध-चतुर्थक परास (Semi-Interquartile Range) भी कहा जाता है।

QD=Q3−Q12\text{QD} = \frac{Q_3 – Q_1}{2}

जहाँ:

Q₁ (प्रथम चतुर्थक) = वह मान जिसके नीचे 25% डेटा स्थित होता है।
Q₃ (तृतीय चतुर्थक) = वह मान जिसके नीचे 75% डेटा स्थित होता है।

मान लीजिए हमारे पास निम्नलिखित डेटा है: 10, 15, 20, 25, 30, 35, 40, 45, 50

Q₁ स्थान = (N+1)4=(9+1)4=2.5\frac{(N+1)}{4} = \frac{(9+1)}{4} = 2.5
Q₁ = 15 + 0.5 \times (20 – 15) = 17.5
Q₃ स्थान = 3×(N+1)4=3×104=7.53 \times \frac{(N+1)}{4} = 3 \times \frac{10}{4} = 7.5
Q₃ = 40 + 0.5 \times (45 – 40) = 42.5

QD=42.5−17.52=252=12.5\text{QD} = \frac{42.5 – 17.5}{2} = \frac{25}{2} = 12.5

माध्य परास विचलन (AD) प्रत्येक डेटा बिंदु और माध्य (Xˉ\bar{X}) के बीच औसत विचलन को मापता है।

AD=∑∣X−Xˉ∣N\text{AD} = \frac{\sum |X – \bar{X}|}{N}

डेटा: 5, 10, 15, 20, 25

माध्य (Xˉ\bar{X}) निकालें:

Xˉ=5+10+15+20+255=15\bar{X} = \frac{5 + 10 + 15 + 20 + 25}{5} = 15

प्रत्येक मान का माध्य से विचलन लें:

| X | ∣X−Xˉ∣|X – \bar{X}| | |—-|——————| | 5 | ∣5−15∣=10|5 – 15| = 10 | | 10 | ∣10−15∣=5|10 – 15| = 5 | | 15 | ∣15−15∣=0|15 – 15| = 0 | | 20 | ∣20−15∣=5|20 – 15| = 5 | | 25 | ∣25−15∣=10|25 – 15| = 10 |

AD निकालें:

AD=(10+5+0+5+10)5=305=6\text{AD} = \frac{(10 + 5 + 0 + 5 + 10)}{5} = \frac{30}{5} = 6

मानक विचलन (SD) डेटा के माध्य से औसत विचलन को दर्शाता है और प्रसरण का सबसे महत्वपूर्ण माप है।

σ=∑(X−Xˉ)2N\sigma = \sqrt{\frac{\sum (X – \bar{X})^2}{N}}

X	X−XˉX – \bar{X}	(X−Xˉ)2(X – \bar{X})^2
5	-10	100
10	-5	25
15	0	0
20	5	25
25	10	100

σ2=(100+25+0+25+100)5=50\sigma^2 = \frac{(100 + 25 + 0 + 25 + 100)}{5} = 50 σ=50≈7.07\sigma = \sqrt{50} \approx 7.07

प्रसरण के ये माप (Range, QD, AD, और SD) डेटा के फैलाव को समझने में मदद करते हैं।

सीमा सरल है, लेकिन चरम मानों से प्रभावित होती है।
चतुर्थक विचलन मध्य 50% डेटा की विविधता दिखाता है।
माध्य परास विचलन डेटा की वास्तविक दूरी को दर्शाता है।
मानक विचलन सबसे सटीक और व्यापक रूप से उपयोग किया जाने वाला माप है।

ये माप अर्थशास्त्र, व्यवसाय और अनुसंधान में महत्वपूर्ण भूमिका निभाते हैं।

UNIT-2(2.4) Measures of Variability: Calculation of Range, QD, AD, SD.

In statistics, variability refers to the extent to which data points in a dataset differ from each other. While measures of central tendency (mean, median, and mode) provide information about the central value of a dataset, measures of variability describe how spread out or dispersed the data is.

The most commonly used measures of variability are:

Range – The difference between the maximum and minimum values.
Quartile Deviation (QD) – The spread of the middle 50% of the data.
Mean Absolute Deviation (AD) – The average of the absolute differences between each data point and the mean.
Standard Deviation (SD) – The most widely used measure, which tells us how much data deviates from the mean.

In this essay, we will discuss the meaning, formulas, and calculations of each measure with examples.

The range is the simplest measure of variability. It is calculated as the difference between the highest and lowest values in a dataset.

Range=Maximum Value−Minimum Value\text{Range} = \text{Maximum Value} – \text{Minimum Value}

Consider the dataset: 5, 12, 20, 25, 30

Range=30−5=25\text{Range} = 30 – 5 = 25

Consider the following grouped frequency distribution:

Class Interval	Frequency
10 – 20	5
20 – 30	10
30 – 40	7
40 – 50	8
50 – 60	5

Range=60−10=50\text{Range} = 60 – 10 = 50

Affected by extreme values (outliers).
Ignores the distribution of data between the maximum and minimum values.

Quartile Deviation (QD) measures the spread of the middle 50% of the dataset. It is half the difference between the third quartile (Q₃) and the first quartile (Q₁).

Quartile Deviation(QD)=Q3−Q12\text{Quartile Deviation} (QD) = \frac{Q_3 – Q_1}{2}

where:

Q₁ (First Quartile) = The value below which 25% of the data lies.
Q₃ (Third Quartile) = The value below which 75% of the data lies.

Consider the dataset: 10, 15, 20, 25, 30, 35, 40, 45, 50

Q₁ position = (N+1)4=(9+1)4=2.5\frac{(N+1)}{4} = \frac{(9+1)}{4} = 2.5

Q₁ = (2nd value) + 0.5 × (3rd value – 2nd value)
Q₁ = 15 + 0.5 × (20 – 15) = 17.5
Q₃ position = 3×(N+1)4=3×104=7.53 \times \frac{(N+1)}{4} = 3 \times \frac{10}{4} = 7.5

Q₃ = (7th value) + 0.5 × (8th value – 7th value)
Q₃ = 40 + 0.5 × (45 – 40) = 42.5

QD=42.5−17.52=252=12.5\text{QD} = \frac{42.5 – 17.5}{2} = \frac{25}{2} = 12.5

Mean Absolute Deviation (AD) is the average of the absolute differences between each data point and the mean. It is a measure of how much data varies from the mean.

For a dataset with values X1,X2,…,XNX_1, X_2, …, X_N:

AD=∑∣X−Xˉ∣N\text{AD} = \frac{\sum |X – \bar{X}|}{N}

where:

Xˉ\bar{X} = Mean
NN = Total number of observations

Consider the dataset: 5, 10, 15, 20, 25

Calculate the mean:

Xˉ=5+10+15+20+255=15\bar{X} = \frac{5 + 10 + 15 + 20 + 25}{5} = 15

Find absolute deviations from the mean:

Calculate AD:

AD=(10+5+0+5+10)5=305=6\text{AD} = \frac{(10 + 5 + 0 + 5 + 10)}{5} = \frac{30}{5} = 6

Standard Deviation (SD) measures the average deviation of values from the mean, considering squared differences.

For ungrouped data:

SD(σ)=∑(X−Xˉ)2N\text{SD} (\sigma) = \sqrt{\frac{\sum (X – \bar{X})^2}{N}}

For grouped data:

SD(σ)=∑f(X−Xˉ)2∑f\text{SD} (\sigma) = \sqrt{\frac{\sum f(X – \bar{X})^2}{\sum f}}

Consider the dataset: 5, 10, 15, 20, 25

Find the mean: Xˉ=15\bar{X} = 15
Find squared deviations from the mean:

X	X−XˉX – \bar{X}	(X−Xˉ)2(X – \bar{X})^2
5	-10	100
10	-5	25
15	0	0
20	5	25
25	10	100

Calculate variance:

σ2=(100+25+0+25+100)5=2505=50\sigma^2 = \frac{(100 + 25 + 0 + 25 + 100)}{5} = \frac{250}{5} = 50

Calculate SD:

σ=50≈7.07\sigma = \sqrt{50} \approx 7.07

Measures of variability (Range, QD, AD, and SD) are essential to understand the spread of data.

Range is simple but affected by outliers.
Quartile Deviation focuses on the middle 50% of data.
Mean Absolute Deviation gives a straightforward measure of dispersion.
Standard Deviation is the most reliable measure, widely used in statistics and research.

Understanding these measures helps in making better decisions, especially in fields like economics, psychology, and business analytics.

UNIT-3(3.1) Correlation: Concept, Types of correlation.

सहसंबंध (Correlation) सांख्यिकी (Statistics) का एक महत्वपूर्ण विषय है, जो दो या अधिक चरों (Variables) के बीच के संबंध को मापता है। यह यह बताने में मदद करता है कि यदि एक चर बढ़ता या घटता है, तो दूसरा चर किस प्रकार प्रभावित होता है।

सहसंबंध का उपयोग मनोविज्ञान, अर्थशास्त्र, व्यवसाय, चिकित्सा और शिक्षा सहित कई क्षेत्रों में किया जाता है। उदाहरण के लिए, मनोविज्ञान में यह पता लगाया जा सकता है कि अध्ययन के घंटों और परीक्षा के अंकों के बीच कोई संबंध है या नहीं।

इस लेख में, हम निम्नलिखित विषयों को विस्तार से समझेंगे:

सहसंबंध की अवधारणा
सहसंबंध के प्रकार
वास्तविक जीवन में सहसंबंध के उदाहरण

सहसंबंध वह सांख्यिकीय माप है जो यह दर्शाता है कि दो चर एक दूसरे से कितने जुड़े हुए हैं। यह यह निर्धारित करता है कि यदि एक चर बढ़ता या घटता है, तो दूसरा चर किस हद तक उसी दिशा में बढ़ेगा या घटेगा।

दिशा (Direction) – सहसंबंध सकारात्मक (Positive), नकारात्मक (Negative), या शून्य (Zero) हो सकता है।
शक्ति (Strength) – सहसंबंध का परिमाण सहसंबंध गुणांक (Correlation Coefficient) द्वारा मापा जाता है।
संतुलन (Symmetry) – XX और YY के बीच सहसंबंध समान होता है, चाहे उसे XX से YY की ओर या YY से XX की ओर मापा जाए।

सहसंबंध को पियर्सन सहसंबंध गुणांक (Pearson Correlation Coefficient, r) द्वारा मापा जाता है, जिसका सूत्र निम्नलिखित है:

r=n∑XY−(∑X)(∑Y)[n∑X2−(∑X)2][n∑Y2−(∑Y)2]r = \frac{n \sum XY – (\sum X)(\sum Y)}{\sqrt{[n \sum X^2 – (\sum X)^2][n \sum Y^2 – (\sum Y)^2]}}

जहाँ:

rr = सहसंबंध गुणांक
X,YX, Y = दो चर
nn = कुल डेटा बिंदु की संख्या

सहसंबंध का मान -1 और +1 के बीच होता है:

r=+1r = +1 → पूर्णत: सकारात्मक सहसंबंध
r=−1r = -1 → पूर्णत: नकारात्मक सहसंबंध
r=0r = 0 → कोई सहसंबंध नहीं

सहसंबंध को दिशा, चरों की संख्या, और मापने की विधि के आधार पर वर्गीकृत किया जा सकता है।

जब एक चर बढ़ता है, तो दूसरा भी बढ़ता है और जब एक घटता है, तो दूसरा भी घटता है।
उदाहरण: लंबाई और वजन – अधिक लंबाई वाले लोग आमतौर पर अधिक वजन वाले होते हैं।
ग्राफ़ पर यह ऊपर की ओर ढलान के रूप में दिखता है।

जब एक चर बढ़ता है, तो दूसरा घटता है और जब एक घटता है, तो दूसरा बढ़ता है।
उदाहरण: तनाव और नींद का समय – अधिक तनाव से कम नींद आती है।
ग्राफ़ पर यह नीचे की ओर ढलान के रूप में दिखता है।

जब दोनों चरों के बीच कोई संबंध नहीं होता।
उदाहरण: जूते का आकार और बुद्धिमत्ता – इन दोनों के बीच कोई संबंध नहीं है।
ग्राफ़ पर डेटा बिंदु बिना किसी पैटर्न के बिखरे हुए दिखते हैं।

जब केवल दो चर शामिल होते हैं।
उदाहरण: तापमान और आइसक्रीम की बिक्री।

जब तीन या अधिक चर शामिल होते हैं।
उदाहरण: वेतन, कार्य-अनुभव, और शिक्षा स्तर के बीच संबंध।

जब तीसरे चर के प्रभाव को नियंत्रित करने के बाद दो चरों के बीच सहसंबंध की गणना की जाती है।
उदाहरण: व्यायाम और वजन घटाने का संबंध, लेकिन आहार के प्रभाव को हटाकर।

जब दोनों चर सतत (Continuous) और रैखिक (Linear) होते हैं।

जब डेटा क्रमबद्ध (Ranked) हो।
उदाहरण: छात्रों के परीक्षा अंक और खेल प्रदर्शन रैंक।
सूत्र:

rs=1−6∑d2n(n2−1)r_s = 1 – \frac{6 \sum d^2}{n(n^2 – 1)}

छोटे डेटा सेट्स के लिए अधिक सटीक विधि।
जब डेटा क्रमबद्ध (Ordinal) हो।

जब एक चर सतत और दूसरा द्विचर (Binary: Yes/No, 0/1) हो।
उदाहरण: लिंग (पुरुष/महिला) और परीक्षा अंकों का संबंध।

जब दोनों चर द्विचर (Binary) हों।
उदाहरण: धूम्रपान (हाँ/नहीं) और फेफड़ों के कैंसर (हाँ/नहीं) का संबंध।

आईक्यू और शैक्षणिक प्रदर्शन → सकारात्मक सहसंबंध।
तनाव और मानसिक स्वास्थ्य → नकारात्मक सहसंबंध।

विज्ञापन और बिक्री → सकारात्मक सहसंबंध।
मुद्रास्फीति और क्रय शक्ति → नकारात्मक सहसंबंध।

धूम्रपान और फेफड़ों के कैंसर का खतरा → सकारात्मक सहसंबंध।
व्यायाम और कोलेस्ट्रॉल स्तर → नकारात्मक सहसंबंध।

अध्ययन का समय और परीक्षा अंक → सकारात्मक सहसंबंध।
अनुपस्थिति और अकादमिक प्रदर्शन → नकारात्मक सहसंबंध।

सहसंबंध कारणता (Causation) को सिद्ध नहीं करता।
गैर-रैखिक संबंधों को पहचान नहीं सकता।
अत्यधिक मूल्यों (Outliers) से प्रभावित होता है।
गुप्त चरों का प्रभाव अनदेखा करता है।

सहसंबंध सांख्यिकी में एक महत्वपूर्ण टूल है, जो डेटा में पैटर्न और रुझान की पहचान करने में मदद करता है। विभिन्न प्रकार के सहसंबंध अलग-अलग परिस्थितियों में उपयोग किए जाते हैं। हालाँकि, यह समझना महत्वपूर्ण है कि सहसंबंध केवल दो चरों के बीच संबंध को दर्शाता है, यह यह साबित नहीं करता कि एक चर दूसरे का कारण है।

UNIT-3(3.1) Correlation: Concept, Types of correlation.

Correlation is a fundamental concept in statistics that measures the relationship between two or more variables. It helps in understanding whether an increase or decrease in one variable is associated with an increase or decrease in another variable. Correlation is widely used in various fields such as psychology, economics, business, and medical research to identify patterns and relationships between data points.

For example, in psychology, correlation can help determine whether there is a relationship between study hours and exam scores or between stress levels and sleep duration. In business, it can show whether an increase in advertising expenditure leads to higher sales.

This essay will cover:

The concept of correlation
The types of correlation
Examples of correlation in real-life scenarios

Correlation refers to the statistical relationship between two variables, indicating how one variable changes in response to another. It does not establish causation (cause-and-effect relationship), but it helps identify patterns and trends in data.

Direction: Correlation can be positive, negative, or zero (no correlation).
Strength: The correlation coefficient determines the strength of the relationship between variables.
Symmetry: Correlation between variable X and variable Y is the same as between Y and X.

Correlation is usually measured using the Pearson Correlation Coefficient (r), which is calculated as:

r=n∑XY−(∑X)(∑Y)[n∑X2−(∑X)2][n∑Y2−(∑Y)2]r = \frac{n \sum XY – (\sum X)(\sum Y)}{\sqrt{[n \sum X^2 – (\sum X)^2][n \sum Y^2 – (\sum Y)^2]}}

where:

rr = correlation coefficient
X,YX, Y = variables
nn = number of data points

The value of rr always lies between -1 and +1.

Correlation can be classified based on direction, number of variables, and method of measurement.

If one variable increases, the other also increases, and vice versa.
Example: Height and weight – Taller people tend to weigh more.
Graphically, this is represented by an upward-sloping trend.

If one variable increases, the other decreases, and vice versa.
Example: Stress and sleep duration – More stress often leads to less sleep.
Represented by a downward-sloping trend.

There is no relationship between the two variables.
Example: Shoe size and intelligence – No connection exists between them.
Graphically, the points appear scattered without a clear pattern.

Involves only two variables.
Example: The relationship between temperature and ice cream sales.

Involves three or more variables.
Example: The correlation between salary, work experience, and education level.

Examines the relationship between two variables while controlling the effect of a third variable.
Example: The correlation between exercise and weight loss, while controlling for diet.

Measures linear relationship between two continuous variables.
Values range from -1 to +1:
- r = +1 → Perfect positive correlation
- r = -1 → Perfect negative correlation
- r = 0 → No correlation

Used when data is ordinal (ranked data) instead of continuous.
Example: Ranking students based on marks and ranking their performance in sports.
Formula:

rs=1−6∑d2n(n2−1)r_s = 1 – \frac{6 \sum d^2}{n(n^2 – 1)}

where dd is the difference between ranks, and nn is the number of observations.

Similar to Spearman’s Rank Correlation, but more accurate for small datasets.
Used in cases where data is ordinal or non-parametric.

Measures the relationship between one continuous variable and one binary variable (0/1, Yes/No).
Example: Relationship between gender (Male/Female) and test scores.

Used when both variables are binary (dichotomous).
Example: Relationship between smoking (Yes/No) and lung disease (Yes/No).

IQ and academic performance → Positive correlation.
Stress and mental health → Negative correlation.

Advertising expenditure and sales → Positive correlation.
Inflation and purchasing power → Negative correlation.

Smoking and lung cancer risk → Positive correlation.
Exercise and cholesterol levels → Negative correlation.

Study time and grades → Positive correlation.
Absenteeism and academic performance → Negative correlation.

Correlation does not imply causation

A high correlation does not mean that one variable causes the other to change.
Example: Ice cream sales and drowning deaths are correlated, but the real cause is hot weather.
Non-linear relationships are not detected
Pearson’s correlation works only for linear relationships.
Outliers affect correlation
Extreme values can distort the correlation coefficient.
Correlation does not account for hidden variables
There may be a third factor influencing both variables.

Correlation is a crucial statistical tool used to measure relationships between variables. It helps in data analysis across multiple domains, including psychology, business, healthcare, and education. Understanding the types of correlation – positive, negative, and zero – along with various methods like Pearson’s and Spearman’s correlation allows researchers to draw meaningful insights from data.

However, it is important to remember that correlation does not imply causation. A strong correlation between two variables does not mean that one variable directly influences the other. Therefore, correlation should be used carefully, considering other statistical methods to determine causal relationships.

By applying correlation analysis effectively, we can improve decision-making in various fields, from predicting market trends to understanding human behavior.

UNIT-3(3.2) Calculation of Correlation: Product moment and Rank difference method.

सहसंबंध (Correlation) सांख्यिकी की एक महत्वपूर्ण विधि है, जो यह मापती है कि दो चर (Variables) के बीच कितना और किस प्रकार का संबंध है। यह हमें यह समझने में मदद करता है कि यदि एक चर बदलता है, तो दूसरा चर उस पर क्या प्रभाव डालता है।

सहसंबंध की गणना के कई तरीके हैं, लेकिन सबसे प्रमुख दो तरीके हैं:

उत्पाद-मोमेंट विधि (Product Moment Method) जिसे पियर्सन सहसंबंध गुणांक (Pearson’s Correlation Coefficient) भी कहा जाता है।
रैंक-अंतर विधि (Rank Difference Method) जिसे स्पीयरमैन रैंक सहसंबंध (Spearman’s Rank Correlation Coefficient) भी कहा जाता है।

इस लेख में हम निम्नलिखित विषयों पर विस्तार से चर्चा करेंगे:

सहसंबंध की अवधारणा और महत्व
उत्पाद-मोमेंट विधि (Pearson’s Correlation Coefficient)
रैंक-अंतर विधि (Spearman’s Rank Correlation Coefficient)
वास्तविक जीवन में सहसंबंध के अनुप्रयोग
सहसंबंध की सीमाएँ

सहसंबंध दो चरों के बीच संबंध को मापने का एक सांख्यिकीय उपाय है। इसका मान -1 से +1 के बीच होता है:

+1 → पूर्णत: सकारात्मक सहसंबंध (यदि एक चर बढ़ता है, तो दूसरा भी बढ़ता है)।
-1 → पूर्णत: नकारात्मक सहसंबंध (यदि एक चर बढ़ता है, तो दूसरा घटता है)।
0 → कोई सहसंबंध नहीं (दोनों चरों के बीच कोई संबंध नहीं)।

यह एक चर के आधार पर दूसरे चर की भविष्यवाणी करने में सहायक होता है।
यह मनोविज्ञान, व्यवसाय, अर्थशास्त्र और चिकित्सा अनुसंधान में व्यापक रूप से उपयोग किया जाता है।
यह डेटा के पैटर्न को समझने और बेहतर निर्णय लेने में मदद करता है।

यह विधि दो सतत (Continuous) चरों के बीच रैखिक संबंध (Linear Relationship) को मापने के लिए प्रयोग की जाती है। यह विधि उन डेटा के लिए सबसे उपयुक्त होती है जो सामान्य वितरण (Normally Distributed) में होते हैं।

पियर्सन सहसंबंध गुणांक (rr) की गणना निम्नलिखित सूत्र से की जाती है:

r=n∑XY−(∑X)(∑Y)[n∑X2−(∑X)2][n∑Y2−(∑Y)2]r = \frac{n \sum XY – (\sum X)(\sum Y)}{\sqrt{[n \sum X^2 – (\sum X)^2][n \sum Y^2 – (\sum Y)^2]}}

जहाँ:

XX और YY दो चर हैं।
nn डेटा बिंदुओं की कुल संख्या है।
∑XY\sum XY दोनों चरों के उत्पाद का योग है।
∑X\sum X और ∑Y\sum Y चरों के व्यक्तिगत मानों का योग है।
∑X2\sum X^2 और ∑Y2\sum Y^2 चरों के वर्गों का योग है।

मान लीजिए कि हमें अध्ययन के घंटे (X) और परीक्षा के अंक (Y) के बीच संबंध खोजना है।

छात्र	अध्ययन घंटे (X)	परीक्षा अंक (Y)	X2X^2	Y2Y^2	XYXY
1	2	40	4	1600	80
2	3	50	9	2500	150
3	5	65	25	4225	325
4	6	70	36	4900	420
5	8	90	64	8100	720

अब गणना करें:

∑X=24\sum X = 24, ∑Y=315\sum Y = 315
∑X2=138\sum X^2 = 138, ∑Y2=21325\sum Y^2 = 21325
∑XY=1695\sum XY = 1695

r=(5)(1695)−(24)(315)[5(138)−(24)2][5(21325)−(315)2]r = \frac{(5)(1695) – (24)(315)}{\sqrt{[5(138) – (24)^2][5(21325) – (315)^2]}} r=0.997r = 0.997

चूँकि r≈1r \approx 1, यह बहुत मजबूत सकारात्मक सहसंबंध को दर्शाता है।

जब डेटा क्रमबद्ध (Ranked) होता है, तो हम स्पीयरमैन रैंक सहसंबंध का उपयोग करते हैं। यह गैर-रैखिक (Non-Linear) संबंधों के लिए उपयुक्त होता है।

rs=1−6∑d2n(n2−1)r_s = 1 – \frac{6 \sum d^2}{n(n^2 – 1)}

जहाँ:

dd = दो चरों की रैंक के बीच का अंतर
nn = कुल डेटा बिंदु की संख्या

rs=1−6(0)5(52−1)r_s = 1 – \frac{6(0)}{5(5^2 – 1)} rs=1r_s = 1

चूँकि rs=1r_s = 1, यह पूर्ण सकारात्मक सहसंबंध को दर्शाता है।

शिक्षा – अध्ययन के घंटे और परीक्षा के अंकों के बीच संबंध।
मनोविज्ञान – तनाव स्तर और मानसिक स्वास्थ्य के बीच संबंध।
व्यवसाय – विज्ञापन व्यय और बिक्री के बीच संबंध।
चिकित्सा – व्यायाम और कोलेस्ट्रॉल स्तर के बीच संबंध।

संबंध का कारण नहीं बताता (Correlation does not imply causation)।
विषम मान (Outliers) से प्रभावित हो सकता है।
केवल रैखिक संबंधों के लिए उपयुक्त।

पियर्सन विधि सतत और रैखिक डेटा के लिए उपयुक्त है, जबकि स्पीयरमैन विधि क्रमबद्ध डेटा के लिए अधिक उपयुक्त होती है। दोनों विधियाँ शोधकर्ताओं और विश्लेषकों को डेटा के बीच संबंधों की समझ बनाने में मदद करती हैं।

UNIT-3(3.2) Calculation of Correlation: Product moment and Rank difference method.

Correlation is a statistical technique used to measure the strength and direction of a relationship between two variables. It helps researchers and analysts understand how one variable changes in response to another. There are multiple ways to calculate correlation, but the Product Moment Method (Pearson’s Correlation Coefficient) and the Rank Difference Method (Spearman’s Rank Correlation Coefficient) are the most commonly used.

This essay will cover:

Concept of correlation and its significance
Product Moment Method (Pearson’s Correlation Coefficient)
Rank Difference Method (Spearman’s Rank Correlation Coefficient)
Real-life applications of correlation
Limitations of correlation

Correlation is a statistical measure that expresses the extent to which two variables are related to each other. It ranges from -1 to +1, where:

+1 indicates a perfect positive correlation (when one variable increases, the other also increases).
-1 indicates a perfect negative correlation (when one variable increases, the other decreases).
0 indicates no correlation (no relationship between the variables).

Helps in predicting one variable based on another.
Used in psychology, business, economics, and medical research.
Helps in identifying patterns and making informed decisions.

The Product Moment Method, also known as Pearson’s Correlation Coefficient, measures the linear relationship between two continuous variables. It is suitable for normally distributed data with a linear relationship.

The Pearson correlation coefficient (rr) is calculated using the formula:

r=n∑XY−(∑X)(∑Y)[n∑X2−(∑X)2][n∑Y2−(∑Y)2]r = \frac{n \sum XY – (\sum X)(\sum Y)}{\sqrt{[n \sum X^2 – (\sum X)^2][n \sum Y^2 – (\sum Y)^2]}}

where:

XX and YY are the two variables.
nn is the number of data points.
∑XY\sum XY is the sum of the product of paired scores.
∑X\sum X and ∑Y\sum Y are the sum of individual values of X and Y.
∑X2\sum X^2 and ∑Y2\sum Y^2 are the sum of squared values of X and Y.

Suppose we have the following data on students’ study hours (X) and exam scores (Y).

Student	Study Hours (X)	Exam Score (Y)	X2X^2	Y2Y^2	XYXY
1	2	40	4	1600	80
2	3	50	9	2500	150
3	5	65	25	4225	325
4	6	70	36	4900	420
5	8	90	64	8100	720

Now, calculate:

∑X=2+3+5+6+8=24\sum X = 2 + 3 + 5 + 6 + 8 = 24
∑Y=40+50+65+70+90=315\sum Y = 40 + 50 + 65 + 70 + 90 = 315
∑X2=4+9+25+36+64=138\sum X^2 = 4 + 9 + 25 + 36 + 64 = 138
∑Y2=1600+2500+4225+4900+8100=21325\sum Y^2 = 1600 + 2500 + 4225 + 4900 + 8100 = 21325
∑XY=80+150+325+420+720=1695\sum XY = 80 + 150 + 325 + 420 + 720 = 1695

Using the formula:

r=(5)(1695)−(24)(315)[5(138)−(24)2][5(21325)−(315)2]r = \frac{(5)(1695) – (24)(315)}{\sqrt{[5(138) – (24)^2][5(21325) – (315)^2]}} r=8475−7560[690−576][106625−99225]r = \frac{8475 – 7560}{\sqrt{[690 – 576][106625 – 99225]}} r=915114×7400r = \frac{915}{\sqrt{114 \times 7400}} r=915841800r = \frac{915}{\sqrt{841800}} r=915917.6=0.997r = \frac{915}{917.6} = 0.997

Since r≈1r \approx 1, this indicates a very strong positive correlation between study hours and exam scores.

The Rank Difference Method, also known as Spearman’s Rank Correlation Coefficient, is used when data is ordinal (ranked data) rather than continuous. It measures monotonic relationships (where variables move in the same or opposite direction but not necessarily at a constant rate).

The Spearman rank correlation coefficient (rsr_s) is calculated using:

rs=1−6∑d2n(n2−1)r_s = 1 – \frac{6 \sum d^2}{n(n^2 – 1)}

where:

dd is the difference between the ranks of X and Y.
nn is the number of observations.

Rank the study hours and exam scores from the previous example.

Now,

∑d2=0\sum d^2 = 0
n=5n = 5

Using the formula:

rs=1−6(0)5(52−1)r_s = 1 – \frac{6(0)}{5(5^2 – 1)} rs=1−0=1r_s = 1 – 0 = 1

Since rs=1r_s = 1, it indicates a perfect positive correlation between study hours and exam scores.

Education – Correlation between attendance and academic performance.
Psychology – Relationship between stress levels and mental health.
Business – Impact of advertising expenditure on sales.
Health – Relationship between exercise and cholesterol levels.

Does not imply causation – A strong correlation does not mean one variable causes the other.
Sensitive to outliers – Extreme values can distort correlation values.
Limited to linear relationships – Pearson’s correlation does not work well for non-linear relationships.

The Product Moment Method (Pearson’s Correlation Coefficient) is ideal for continuous and linear data, while the Rank Difference Method (Spearman’s Rank Correlation Coefficient) is better suited for ranked or ordinal data. Both methods provide valuable insights into relationships between variables, helping researchers and analysts make data-driven decisions.

UNIT-3(3.3) Calculation of t-test: Independent group and Correlated group.

t-परिक्षण (t-test) एक सांख्यिकीय परीक्षण है जिसका उपयोग दो समूहों के औसत (Mean) की तुलना करने और यह निर्धारित करने के लिए किया जाता है कि उनके बीच का अंतर महत्वपूर्ण है या केवल संयोग मात्र। यह मनोविज्ञान, शिक्षा, चिकित्सा और व्यवसाय जैसे क्षेत्रों में व्यापक रूप से उपयोग किया जाता है।

t-परिक्षण के दो मुख्य प्रकार होते हैं:

स्वतंत्र नमूना t-परिक्षण (Independent Samples t-test) – जब दो अलग-अलग समूहों की तुलना की जाती है।
युग्मित (सहसंबद्ध) नमूना t-परिक्षण (Paired Samples t-test) – जब एक ही समूह का दो स्थितियों में या दो समय बिंदुओं पर परीक्षण किया जाता है।

इस लेख में, हम निम्नलिखित बिंदुओं पर चर्चा करेंगे:

t-परिक्षण की अवधारणा और इसका महत्व
स्वतंत्र नमूना t-परिक्षण – सूत्र, गणना और उदाहरण
युग्मित (सहसंबद्ध) नमूना t-परिक्षण – सूत्र, गणना और उदाहरण
t-परिक्षण के परिणामों की व्याख्या
t-परिक्षण की सीमाएँ और धारणाएँ

t-परिक्षण दो समूहों के बीच अंतर को मापने का एक सांख्यिकीय तरीका है। यह t-वितरण (t-distribution) पर आधारित होता है और विशेष रूप से छोटे नमूनों (n < 30) के लिए उपयोग किया जाता है।

यह यह निर्धारित करने में मदद करता है कि दो समूहों के बीच का अंतर वास्तविक है या केवल संयोगवश है।
अनुसंधान (Research) में नियंत्रण (Control) और प्रयोगात्मक (Experimental) समूह की तुलना के लिए उपयोग किया जाता है।
मनोविज्ञान, व्यवसाय, शिक्षा और चिकित्सा में महत्वपूर्ण निर्णय लेने में मदद करता है।

यह परीक्षण तब प्रयोग किया जाता है जब हम दो स्वतंत्र समूहों की तुलना करना चाहते हैं। उदाहरण के लिए,

पुरुष और महिला छात्रों के परीक्षा अंकों की तुलना।
एक नई दवा और एक प्लेसिबो (Placebo) के प्रभाव की तुलना।

स्वतंत्र नमूना t-परिक्षण का सूत्र इस प्रकार है:

t=X1ˉ−X2ˉs12n1+s22n2t = \frac{\bar{X_1} – \bar{X_2}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

जहाँ:

X1ˉ\bar{X_1} और X2ˉ\bar{X_2} = समूह 1 और समूह 2 का औसत
s12s_1^2 और s22s_2^2 = समूह 1 और समूह 2 का प्रसरण (Variance)
n1n_1 और n2n_2 = समूह 1 और समूह 2 का नमूना आकार

स्वतंत्रता की डिग्री (Degrees of Freedom, df) की गणना:

df=n1+n2−2df = n_1 + n_2 – 2

एक शोधकर्ता यह जांचना चाहता है कि क्या पारंपरिक और आधुनिक शिक्षण विधियों से छात्रों के अंकों में कोई महत्वपूर्ण अंतर है।

समूह	परीक्षा अंक	औसत (Xˉ\bar{X})	प्रसरण (s2s^2)	नमूना आकार (n)
पारंपरिक	50, 55, 52, 48, 53	51.6	6.3	5
आधुनिक	60, 62, 58, 65, 63	61.6	7.3	5

अब सूत्र का उपयोग करें:

t=51.6−61.66.35+7.35t = \frac{51.6 – 61.6}{\sqrt{\frac{6.3}{5} + \frac{7.3}{5}}} t=−101.26+1.46t = \frac{-10}{\sqrt{1.26 + 1.46}} t=−102.72t = \frac{-10}{\sqrt{2.72}} t=−101.65=−6.06t = \frac{-10}{1.65} = -6.06

यदि t-सूची (t-table) में df = 8 और α = 0.05 के लिए नाजुक मान (critical value) = 2.306 है, और चूंकि |t| = 6.06 > 2.306, हम शून्य परिकल्पना (null hypothesis) को अस्वीकार करते हैं। इसका अर्थ है कि आधुनिक शिक्षण विधि पारंपरिक विधि से अधिक प्रभावी है।

जब हम एक ही समूह का दो अलग-अलग समय बिंदुओं पर परीक्षण करते हैं, तो हम युग्मित t-परिक्षण का उपयोग करते हैं। उदाहरण के लिए:

छात्रों के प्रशिक्षण से पहले और बाद के अंकों की तुलना।
कर्मचारियों के तनाव स्तर के पहले और बाद के मापन की तुलना।

t=DˉsD/nt = \frac{\bar{D}}{s_D / \sqrt{n}}

जहाँ:

Dˉ\bar{D} = युग्मित अंतर का औसत
sDs_D = अंतर का मानक विचलन
nn = युग्मों की संख्या

स्वतंत्रता की डिग्री:

df=n−1df = n – 1

एक मनोवैज्ञानिक 5 कर्मचारियों के तनाव स्तर को तनाव प्रबंधन कार्यक्रम से पहले और बाद में मापता है।

कर्मचारी	पहले (X)	बाद में (Y)	अंतर (D = X – Y)	D2D^2
1	80	72	8	64
2	85	78	7	49
3	78	74	4	16
4	90	83	7	49
5	76	70	6	36

गणना करने पर, t = 9.41 आता है। यदि t-सूची के अनुसार नाजुक मान 2.776 है, तो |t| > 2.776, जिसका अर्थ है कि तनाव प्रबंधन कार्यक्रम प्रभावी था।

डेटा को सामान्य वितरण (Normal Distribution) में होना चाहिए।
समूहों का प्रसरण समान होना चाहिए (स्वतंत्र t-परिक्षण के लिए)।
डेटा स्वतंत्र होना चाहिए (स्वतंत्र t-परिक्षण के लिए)।

छोटे नमूना आकार के लिए उपयुक्त।
विषम मूल्यों (Outliers) से प्रभावित हो सकता है।
केवल दो समूहों की तुलना के लिए उपयोगी।

स्वतंत्र t-परिक्षण का उपयोग दो अलग-अलग समूहों के औसत की तुलना के लिए किया जाता है।
युग्मित t-परिक्षण का उपयोग एक ही समूह के अलग-अलग स्थितियों की तुलना के लिए किया जाता है।
यह शोधकर्ताओं और विश्लेषकों को समूहों के बीच महत्वपूर्ण अंतर को समझने में मदद करता है।

UNIT-3(3.3) Calculation of t-test: Independent group and Correlated group.

The t-test is a statistical test used to compare the means of two groups and determine whether the differences between them are statistically significant. It is widely used in research fields such as psychology, education, medicine, and business.

There are two main types of t-tests:

Independent Samples t-test (for comparing two separate groups)
Paired (Correlated) Samples t-test (for comparing the same group at different times or conditions)

In this article, we will cover:

The concept of the t-test and its importance
Independent Samples t-test – Formula, Calculation, and Example
Paired (Correlated) Samples t-test – Formula, Calculation, and Example
Interpretation of t-test results
Assumptions and Limitations of t-tests

A t-test is a statistical test used to compare the means of two groups to determine if there is a significant difference between them. It is based on the concept of the t-distribution and is used when the sample size is small (n < 30).

It helps determine if differences between two groups are real or due to random chance.
Used in experimental research to compare control and experimental groups.
Helps in decision-making in psychology, business, education, and healthcare.

The Independent Samples t-test is used when we compare the means of two different (independent) groups. For example, comparing the exam scores of male and female students or the reaction times of two different groups of participants in an experiment.

The formula for an independent samples t-test is:

t=X1ˉ−X2ˉs12n1+s22n2t = \frac{\bar{X_1} – \bar{X_2}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Where:

X1ˉ\bar{X_1} and X2ˉ\bar{X_2} = Mean of group 1 and group 2
s12s_1^2 and s22s_2^2 = Variance of group 1 and group 2
n1n_1 and n2n_2 = Sample size of group 1 and group 2

The degrees of freedom (df) are calculated as:

df=n1+n2−2df = n_1 + n_2 – 2

A researcher wants to test whether there is a significant difference in the test scores of students taught using traditional methods versus modern teaching methods.

Group	Test Scores	Mean (Xˉ\bar{X})	Variance (s2s^2)	Sample Size (n)
Traditional	50, 55, 52, 48, 53	51.6	6.3	5
Modern	60, 62, 58, 65, 63	61.6	7.3	5

Now, apply the formula:

If the critical value from the t-table (for df=8df = 8 and α = 0.05) is 2.306, since |t| = 6.06 is greater than 2.306, we reject the null hypothesis. This means that the modern teaching method leads to significantly higher scores.

The Paired (Correlated) Samples t-test is used when comparing two related samples or the same group at different times. For example:

Measuring students’ test scores before and after a training program.
Comparing participants’ heart rate before and after an exercise session.

The formula for a paired t-test is:

t=DˉsD/nt = \frac{\bar{D}}{s_D / \sqrt{n}}

Where:

Dˉ\bar{D} = Mean of the differences between paired values
sDs_D = Standard deviation of the differences
nn = Number of pairs

The degrees of freedom (df) are calculated as:

df=n−1df = n – 1

A psychologist measures the stress levels of 5 employees before and after a stress management program.

Employee	Before (X)	After (Y)	Difference (D = X – Y)	D2D^2
1	80	72	8	64
2	85	78	7	49
3	78	74	4	16
4	90	83	7	49
5	76	70	6	36

Now, calculate:

∑D=8+7+4+7+6=32\sum D = 8 + 7 + 4 + 7 + 6 = 32
∑D2=64+49+16+49+36=214\sum D^2 = 64 + 49 + 16 + 49 + 36 = 214
Dˉ=∑Dn=325=6.4\bar{D} = \frac{\sum D}{n} = \frac{32}{5} = 6.4
Variance of D:

sD2=∑D2−(∑D)2nn−1s_D^2 = \frac{\sum D^2 – \frac{(\sum D)^2}{n}}{n-1} sD2=214−(32)255−1s_D^2 = \frac{214 – \frac{(32)^2}{5}}{5-1} sD2=214−204.84=9.24=2.3s_D^2 = \frac{214 – 204.8}{4} = \frac{9.2}{4} = 2.3 sD=2.3=1.52s_D = \sqrt{2.3} = 1.52

Now, calculate t:

t=6.41.52/5t = \frac{6.4}{1.52 / \sqrt{5}} t=6.40.68=9.41t = \frac{6.4}{0.68} = 9.41

If the critical value from the t-table (for df=4df = 4 and α = 0.05) is 2.776, since |t| = 9.41 is greater than 2.776, we reject the null hypothesis. This means the stress management program significantly reduced stress levels.

If ∣t∣|t| is greater than the critical value → Reject the null hypothesis → There is a significant difference.
If ∣t∣|t| is less than the critical value → Fail to reject the null hypothesis → No significant difference.

Data should be normally distributed.
Groups should have equal variances (for independent t-test).
Observations should be independent (for independent t-test).

Not suitable for non-normal data.
Sensitive to outliers.
Works best for small sample sizes.

The Independent Samples t-test is used to compare two separate groups, while the Paired Samples t-test is used for repeated measurements on the same individuals. Understanding these tests helps researchers make informed conclusions about differences in means between groups.

UNIT-4(4.1) Chi square: Concept.

काइ-स्क्वायर परीक्षण (Chi-Square Test) एक गैर-पैरामीट्रिक (Non-Parametric) सांख्यिकीय परीक्षण है, जिसका उपयोग यह जांचने के लिए किया जाता है कि क्या दो श्रेणीबद्ध (Categorical) चर के बीच कोई महत्वपूर्ण संबंध है या नहीं। यह मनोविज्ञान, जैव विज्ञान, सामाजिक विज्ञान, और व्यवसाय जैसे क्षेत्रों में आवृत्ति डेटा (Frequency Data) के विश्लेषण और परिकल्पना परीक्षण (Hypothesis Testing) के लिए व्यापक रूप से उपयोग किया जाता है।

यह परीक्षण मुख्य रूप से नाममात्र (Nominal) डेटा के लिए उपयोग किया जाता है, जहाँ चर को विभिन्न श्रेणियों में वर्गीकृत किया जाता है लेकिन उनका कोई निश्चित क्रम नहीं होता।

इस लेख में हम निम्नलिखित बिंदुओं को कवर करेंगे:

काइ-स्क्वायर परीक्षण की अवधारणा
काइ-स्क्वायर परीक्षण के प्रकार
- काइ-स्क्वायर समरूपता परीक्षण (Goodness of Fit)
- काइ-स्क्वायर स्वतंत्रता परीक्षण (Test for Independence)
काइ-स्क्वायर परीक्षण की पूर्व-धारणाएँ
काइ-स्क्वायर परीक्षण की गणना (चरण-दर-चरण प्रक्रिया)
परिणामों की व्याख्या
काइ-स्क्वायर परीक्षण के अनुप्रयोग
काइ-स्क्वायर परीक्षण की सीमाएँ

काइ-स्क्वायर परीक्षण (χ² टेस्ट) का उपयोग यह निर्धारित करने के लिए किया जाता है कि क्या दो श्रेणीबद्ध चर के बीच कोई महत्वपूर्ण सांख्यिकीय अंतर है।

यह प्रेक्षित (Observed) और अपेक्षित (Expected) आवृत्तियों के बीच के अंतर को मापता है और यह निर्धारित करता है कि यह अंतर मात्र संयोगवश है या दो चर के बीच वास्तव में कोई संबंध है।

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

जहाँ:

χ2\chi^2 = काइ-स्क्वायर सांख्यिकीय मान
OO = प्रेक्षित आवृत्ति (Observed Frequency)
EE = अपेक्षित आवृत्ति (Expected Frequency)

यदि χ² का मान बड़ा होता है, तो यह दर्शाता है कि प्रेक्षित और अपेक्षित आवृत्तियों के बीच बड़ा अंतर है, जिससे यह संभावना बढ़ जाती है कि दो चर एक-दूसरे से संबंधित हैं।

यह परीक्षण यह जांचने के लिए किया जाता है कि क्या कोई विशेष डेटा सेट किसी दिए गए सैद्धांतिक वितरण (Theoretical Distribution) के अनुरूप है या नहीं।

उदाहरण:
एक शोधकर्ता यह परीक्षण करना चाहता है कि क्या तीन अलग-अलग ब्रांडों (A, B, और C) के ग्राहक समान रूप से वितरित हैं।

ब्रांड	प्रेक्षित आवृत्ति (O)	अपेक्षित आवृत्ति (E)
A	40	50
B	60	50
C	50	50

यह परीक्षण यह जांचेगा कि क्या ब्रांडों के बीच पसंद में महत्वपूर्ण अंतर है या नहीं।

इस परीक्षण का उपयोग यह पता लगाने के लिए किया जाता है कि दो श्रेणीबद्ध चर स्वतंत्र हैं या संबंधित हैं।

उदाहरण:
एक शोधकर्ता यह परीक्षण करना चाहता है कि लिंग (पुरुष/महिला) और मतदान वरीयता (पार्टी X / पार्टी Y) के बीच कोई संबंध है या नहीं।

लिंग	पार्टी X	पार्टी Y
पुरुष	30	20
महिला	25	25

यदि काइ-स्क्वायर परीक्षण दर्शाता है कि अंतर महत्वपूर्ण है, तो यह निष्कर्ष निकाला जा सकता है कि लिंग और मतदान वरीयता के बीच संबंध है।

डेटा श्रेणीबद्ध होना चाहिए (जैसे लिंग, व्यवसाय, पसंद)।
सभी अवलोकन स्वतंत्र होने चाहिए (हर डेटा बिंदु एक अलग विषय का प्रतिनिधित्व करता है)।
हर श्रेणी की अपेक्षित आवृत्ति कम से कम 5 होनी चाहिए।
नमूना आकार पर्याप्त रूप से बड़ा होना चाहिए।

एक सर्वेक्षण में यह जांचा जाता है कि व्यायाम करने की आदतें (नियमित/अनियमित) और हृदय रोग (हां/नहीं) के बीच कोई संबंध है या नहीं।

व्यायाम आदत	हृदय रोग (हां)	हृदय रोग (नहीं)	कुल
नियमित	30	70	100
अनियमित	50	50	100
कुल	80	120	200

E=पंक्ति योग×स्तंभ योगकुल योगE = \frac{\text{पंक्ति योग} \times \text{स्तंभ योग}}{\text{कुल योग}}

उदाहरण के लिए, नियमित & हां के लिए अपेक्षित आवृत्ति:

E=100×80200=40E = \frac{100 \times 80}{200} = 40

इसी तरह अन्य श्रेणियों के लिए अपेक्षित आवृत्तियाँ निकाली जाती हैं।

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

गणना करने के बाद, मान χ2=8.34\chi^2 = 8.34 प्राप्त होता है।

यदि df = (पंक्ति – 1) × (स्तंभ – 1) = 1 और महत्वपूर्ण मान 3.84 है, और χ² = 8.34 > 3.84, तो शून्य परिकल्पना को अस्वीकार किया जाता है, जिससे निष्कर्ष निकलता है कि व्यायाम करने की आदतें और हृदय रोग के बीच महत्वपूर्ण संबंध है।

यदि χ² का मान महत्वपूर्ण मान से अधिक है, तो शून्य परिकल्पना को अस्वीकार किया जाता है → दोनों चर संबंधित हैं।
यदि χ² का मान महत्वपूर्ण मान से कम है, तो शून्य परिकल्पना अस्वीकार नहीं की जाती → दोनों चर स्वतंत्र हैं।

मनोविज्ञान: तनाव और मुकाबला करने की रणनीतियों के बीच संबंध की जांच।
शिक्षा: विभिन्न शिक्षण विधियों की तुलना।
विपणन: उपभोक्ता पसंद और ब्रांड चयन का विश्लेषण।
चिकित्सा अनुसंधान: जीवनशैली कारकों और बीमारियों के बीच संबंध की जाँच।

छोटे नमूना आकार के लिए उपयुक्त नहीं।
कारण-प्रभाव (Cause-Effect) संबंध नहीं दर्शा सकता।
केवल श्रेणीबद्ध डेटा पर लागू होता है।

काइ-स्क्वायर परीक्षण एक महत्वपूर्ण सांख्यिकीय तकनीक है जो यह निर्धारित करने में मदद करता है कि क्या दो श्रेणीबद्ध चर स्वतंत्र हैं या संबंधित। इसका सही उपयोग अनुसंधान निष्कर्षों को अधिक विश्वसनीय बनाता है।

UNIT-4(4.1) Chi square: Concept.

The Chi-Square test is a non-parametric statistical test used to determine if there is a significant association between categorical variables. It is widely used in fields such as psychology, biology, social sciences, and business to analyze frequency data and test hypotheses.

The Chi-Square test is particularly useful when dealing with nominal (categorical) data, where variables are classified into different groups without any inherent order.

This article will cover:

Concept of the Chi-Square test
Types of Chi-Square tests
- Chi-Square Test for Goodness of Fit
- Chi-Square Test for Independence
Assumptions of the Chi-Square test
Chi-Square Test Calculation (Step-by-Step)
Interpretation of Results
Applications of the Chi-Square Test
Limitations of the Chi-Square Test

The Chi-Square test (χ² test) is used to determine whether there is a statistically significant difference between the expected and observed frequencies in one or more categories.

It helps answer questions such as:

“Is there a relationship between gender and voting preference?”
“Do customer preferences for different brands differ significantly?”

The Chi-Square test is based on comparing observed data with expected data under the assumption that there is no relationship between the variables (null hypothesis).

The formula for the Chi-Square statistic is:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

Where:

χ2\chi^2 = Chi-Square statistic
OO = Observed frequency (actual data)
EE = Expected frequency (theoretical data based on null hypothesis)

The larger the Chi-Square value, the greater the difference between observed and expected values, indicating a possible relationship between the variables.

This test is used to determine if a sample data set fits a specific distribution. It compares observed frequencies with expected frequencies based on a given theoretical model.

Example:
A researcher wants to test whether the distribution of customers across three different brands (A, B, and C) is equal.

Brand	Observed Frequency (O)	Expected Frequency (E)
A	40	50
B	60	50
C	50	50

The researcher uses the Chi-Square test for goodness of fit to see if the differences in observed frequencies are statistically significant.

This test determines whether two categorical variables are independent or related.

Example:
A researcher wants to test whether gender (Male/Female) is related to voting preference (Party X / Party Y).

Gender	Party X	Party Y
Male	30	20
Female	25	25

The Chi-Square test for independence helps determine if gender influences voting preference.

Before using the Chi-Square test, certain assumptions must be met:

Data should be categorical (e.g., gender, occupation, preferences).
Observations should be independent (each data point represents a separate subject).
Expected frequency should be at least 5 for each category.
Sample size should be sufficiently large to ensure accurate results.

Let’s go through an example of the Chi-Square Test for Independence.

A survey is conducted to see if there is a relationship between exercise habits (Regular/Irregular) and Heart Disease (Yes/No) among 200 people.

Exercise Habit	Heart Disease (Yes)	Heart Disease (No)	Total
Regular	30	70	100
Irregular	50	50	100
Total	80	120	200

Step 1: Calculate Expected Frequencies
The expected frequency for each cell is calculated as:

E=Row Total×Column TotalGrand TotalE = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}

For Regular & Yes:

E=100×80200=40E = \frac{100 \times 80}{200} = 40

For Regular & No:

E=100×120200=60E = \frac{100 \times 120}{200} = 60

For Irregular & Yes:

E=100×80200=40E = \frac{100 \times 80}{200} = 40

For Irregular & No:

E=100×120200=60E = \frac{100 \times 120}{200} = 60

Step 2: Apply the Chi-Square Formula

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E} χ2=(30−40)240+(70−60)260+(50−40)240+(50−60)260\chi^2 = \frac{(30-40)^2}{40} + \frac{(70-60)^2}{60} + \frac{(50-40)^2}{40} + \frac{(50-60)^2}{60} =(−10)240+(10)260+(10)240+(−10)260= \frac{(-10)^2}{40} + \frac{(10)^2}{60} + \frac{(10)^2}{40} + \frac{(-10)^2}{60} =10040+10060+10040+10060= \frac{100}{40} + \frac{100}{60} + \frac{100}{40} + \frac{100}{60} =2.5+1.67+2.5+1.67=8.34= 2.5 + 1.67 + 2.5 + 1.67 = 8.34

Step 3: Compare with the Critical Value
For df = (rows – 1) × (columns – 1) = (2-1) × (2-1) = 1, the critical value from the Chi-Square table at α = 0.05 is 3.84.

Since χ² = 8.34 > 3.84, we reject the null hypothesis, meaning there is a significant relationship between exercise habits and heart disease.

If χ² is greater than the critical value, reject the null hypothesis → There is a significant relationship.
If χ² is less than the critical value, fail to reject the null hypothesis → No significant relationship.

Psychology: Examining the relationship between stress levels and coping mechanisms.
Education: Analyzing student preferences for different teaching methods.
Marketing: Studying consumer preferences for brands based on demographic groups.
Medical Research: Investigating the link between lifestyle factors and diseases.

Cannot be used for small sample sizes (expected frequency < 5).
Does not indicate cause-and-effect relationships.
Only works for categorical data, not numerical data.

The Chi-Square test is a powerful statistical tool used to analyze relationships between categorical variables. By comparing observed and expected frequencies, it helps researchers determine whether variables are independent or related. However, its proper use requires meeting its assumptions and interpreting results correctly.

UNIT-4(4.2) Computation of Chi-Square: Equal Distribution Hypothesis and Independent Hypothesis.

काइ-स्क्वायर (χ²) परीक्षण एक महत्वपूर्ण गैर-पैरामीट्रिक (Non-Parametric) सांख्यिकीय परीक्षण है, जिसका उपयोग यह जांचने के लिए किया जाता है कि क्या दो श्रेणीबद्ध (Categorical) चर के बीच कोई महत्वपूर्ण संबंध है। यह मुख्य रूप से आवृत्ति डेटा (Frequency Data) का विश्लेषण करने और परिकल्पनाओं की जांच करने के लिए उपयोग किया जाता है।

काइ-स्क्वायर परीक्षण के दो मुख्य प्रकार हैं:

काइ-स्क्वायर समरूपता परीक्षण (Equal Distribution Hypothesis या Goodness-of-Fit Test) – यह जांचता है कि क्या प्रेक्षित वितरण (Observed Distribution) अपेक्षित वितरण (Expected Distribution) से मेल खाता है।
काइ-स्क्वायर स्वतंत्रता परीक्षण (Independent Hypothesis या Test of Independence) – यह जांचता है कि क्या दो श्रेणीबद्ध चर एक-दूसरे से स्वतंत्र हैं या उनके बीच कोई संबंध है।

इस लेख में निम्नलिखित विषयों को विस्तार से समझाया जाएगा:

काइ-स्क्वायर परीक्षण का सूत्र और अवधारणा
समान वितरण परिकल्पना (Equal Distribution Hypothesis) के लिए काइ-स्क्वायर की गणना
स्वतंत्र परिकल्पना (Independent Hypothesis) के लिए काइ-स्क्वायर की गणना
परिणामों की व्याख्या
काइ-स्क्वायर परीक्षण के अनुप्रयोग
काइ-स्क्वायर परीक्षण की सीमाएँ

काइ-स्क्वायर सांख्यिकीय मान की गणना निम्नलिखित सूत्र से की जाती है:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

जहाँ:

χ2\chi^2 = काइ-स्क्वायर सांख्यिकीय मान
OO = प्रेक्षित आवृत्ति (Observed Frequency)
EE = अपेक्षित आवृत्ति (Expected Frequency)

यदि χ² का मान अधिक होता है, तो इसका अर्थ यह होता है कि प्रेक्षित और अपेक्षित डेटा के बीच बड़ा अंतर है, जो यह दर्शाता है कि दो चर एक-दूसरे से जुड़े हो सकते हैं।

इस परीक्षण का उपयोग यह जांचने के लिए किया जाता है कि क्या किसी डेटा का वितरण किसी सैद्धांतिक (Theoretical) या अनुमानित वितरण से मेल खाता है।

उदाहरण के लिए, यदि यह माना जाए कि तीन अलग-अलग ब्रांडों (A, B, और C) की समान मांग है, तो इनका बिक्री वितरण भी समान होना चाहिए।

एक शोधकर्ता यह परीक्षण करना चाहता है कि क्या ग्राहक तीन ब्रांडों (A, B, और C) को समान रूप से पसंद करते हैं। 150 ग्राहकों के सर्वेक्षण के बाद निम्नलिखित डेटा प्राप्त हुआ:

ब्रांड	प्रेक्षित आवृत्ति (O)	अपेक्षित आवृत्ति (E)
A	55	50
B	45	50
C	50	50

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E} =(55−50)250+(45−50)250+(50−50)250= \frac{(55-50)^2}{50} + \frac{(45-50)^2}{50} + \frac{(50-50)^2}{50} =2550+2550+050= \frac{25}{50} + \frac{25}{50} + \frac{0}{50} =0.5+0.5+0= 0.5 + 0.5 + 0 =1.0= 1.0

df=(k−1)=(3−1)=2df = (k – 1) = (3 – 1) = 2

जहाँ k = श्रेणियों की संख्या।

df = 2 और α = 0.05 पर, महत्वपूर्ण मान = 5.99।

क्योंकि 1.0 < 5.99, हम शून्य परिकल्पना को अस्वीकार नहीं करते, जिसका अर्थ है कि ब्रांड की पसंद में कोई महत्वपूर्ण अंतर नहीं है।

इस परीक्षण का उपयोग यह जांचने के लिए किया जाता है कि क्या दो श्रेणीबद्ध चर स्वतंत्र हैं या संबंधित हैं।

उदाहरण: क्या लिंग (पुरुष/महिला) और मतदान वरीयता (पार्टी X/पार्टी Y) के बीच कोई संबंध है?

200 मतदाताओं पर एक सर्वेक्षण किया गया और निम्नलिखित डेटा प्राप्त हुआ:

लिंग	पार्टी X	पार्टी Y	कुल
पुरुष	40	60	100
महिला	50	50	100
कुल	90	110	200

E=Row Total×Column TotalGrand TotalE = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}

उदाहरण: पुरुष और पार्टी X के लिए अपेक्षित आवृत्ति

E=100×90200=45E = \frac{100 \times 90}{200} = 45

इसी तरह, अन्य अपेक्षित आवृत्तियाँ निकाली जाती हैं।

लिंग	पार्टी X (O)	पार्टी X (E)	पार्टी Y (O)	पार्टी Y (E)
पुरुष	40	45	60	55
महिला	50	45	50	55

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

गणना करने के बाद,

χ2=2.02\chi^2 = 2.02

df=(rows−1)×(columns−1)=1df = (rows – 1) \times (columns – 1) = 1

महत्वपूर्ण मान (Critical Value) = 3.84 (α = 0.05, df = 1)।
चूंकि 2.02 < 3.84, हम शून्य परिकल्पना को अस्वीकार नहीं कर सकते, यानी लिंग और मतदान वरीयता स्वतंत्र हैं।

मनोविज्ञान: व्यक्तित्व प्रकार और तनाव प्रतिक्रिया का संबंध।
शिक्षा: अलग-अलग शिक्षण विधियाँ और छात्रों का प्रदर्शन।
स्वास्थ्य: जीवनशैली और बीमारियों के बीच संबंध।

काइ-स्क्वायर परीक्षण समान वितरण और स्वतंत्रता की परिकल्पना की जाँच करने के लिए एक महत्वपूर्ण सांख्यिकीय उपकरण है। सही गणना और व्याख्या अनुसंधान निष्कर्षों को अधिक विश्वसनीय बनाती है।

UNIT-4(4.2) Computation of Chi-Square: Equal Distribution Hypothesis and Independent Hypothesis.

The Chi-Square (χ²) test is a widely used non-parametric statistical test that helps determine whether there is a significant association between two categorical variables. It is particularly useful for analyzing frequency data and testing hypotheses in various research fields, such as psychology, social sciences, medicine, and business.

There are two main types of Chi-Square tests:

Chi-Square Goodness-of-Fit Test (Equal Distribution Hypothesis) – Determines if an observed distribution matches an expected distribution.
Chi-Square Test for Independence (Independent Hypothesis) – Determines if two categorical variables are related or independent.

This article will cover:

Concept and formula of the Chi-Square test
Computation of the Chi-Square test for the Equal Distribution Hypothesis
Computation of the Chi-Square test for the Independent Hypothesis
Interpretation of results
Applications of the Chi-Square test
Limitations of the Chi-Square test

The Chi-Square statistic is calculated using the formula:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}

Where:

χ2\chi^2 = Chi-Square statistic
OO = Observed frequency (actual count)
EE = Expected frequency (theoretical count based on the hypothesis)

A large Chi-Square value suggests that the observed data does not fit the expected data, indicating a significant relationship or deviation from the expected distribution.

The Goodness-of-Fit test is used when we want to check whether an observed categorical data distribution follows a theoretically expected distribution.

For example, if we assume that customers choose three brands (A, B, and C) equally, the expected frequency should be the same for each brand.

A researcher wants to test if customers prefer three brands (A, B, and C) equally. A sample of 150 customers was surveyed, and their responses were recorded.

Brand	Observed Frequency (O)	Expected Frequency (E)
A	55	50
B	45	50
C	50	50

Step 1: Calculate the Chi-Square Statistic

Using the Chi-Square formula:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E} =(55−50)250+(45−50)250+(50−50)250= \frac{(55-50)^2}{50} + \frac{(45-50)^2}{50} + \frac{(50-50)^2}{50} =(5)250+(−5)250+(0)250= \frac{(5)^2}{50} + \frac{(-5)^2}{50} + \frac{(0)^2}{50} =2550+2550+0= \frac{25}{50} + \frac{25}{50} + 0 =0.5+0.5+0= 0.5 + 0.5 + 0 =1.0= 1.0

Step 2: Determine the Degrees of Freedom (df)

df=(k−1)=(3−1)=2df = (k – 1) = (3 – 1) = 2

where k is the number of categories.

Step 3: Compare with the Chi-Square Critical Value
From the Chi-Square table, at df = 2 and α = 0.05, the critical value is 5.99.

Since 1.0 < 5.99, we fail to reject the null hypothesis. This means that there is no significant difference in brand preference, and the assumption of equal distribution is valid.

The Chi-Square Test for Independence is used when we want to check whether two categorical variables are related or independent.

For example, we might want to check whether gender (Male/Female) is associated with voting preference (Party X/Party Y).

A political analyst surveys 200 voters to see if gender affects voting preference.

Gender	Party X	Party Y	Total
Male	40	60	100
Female	50	50	100
Total	90	110	200

Step 1: Calculate Expected Frequencies (E)
The expected frequency for each cell is calculated as:

E=Row Total×Column TotalGrand TotalE = \frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}

For Male & Party X:

E=100×90200=45E = \frac{100 \times 90}{200} = 45

For Male & Party Y:

E=100×110200=55E = \frac{100 \times 110}{200} = 55

For Female & Party X:

E=100×90200=45E = \frac{100 \times 90}{200} = 45

For Female & Party Y:

E=100×110200=55E = \frac{100 \times 110}{200} = 55

Gender	Party X (O)	Party X (E)	Party Y (O)	Party Y (E)
Male	40	45	60	55
Female	50	45	50	55

Step 2: Compute the Chi-Square Statistic

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E} =(40−45)245+(60−55)255+(50−45)245+(50−55)255= \frac{(40-45)^2}{45} + \frac{(60-55)^2}{55} + \frac{(50-45)^2}{45} + \frac{(50-55)^2}{55} =(−5)245+(5)255+(5)245+(−5)255= \frac{(-5)^2}{45} + \frac{(5)^2}{55} + \frac{(5)^2}{45} + \frac{(-5)^2}{55} =2545+2555+2545+2555= \frac{25}{45} + \frac{25}{55} + \frac{25}{45} + \frac{25}{55} =0.56+0.45+0.56+0.45= 0.56 + 0.45 + 0.56 + 0.45 =2.02= 2.02

Step 3: Determine the Degrees of Freedom (df)

df=(rows−1)×(columns−1)=(2−1)×(2−1)=1df = (rows – 1) \times (columns – 1) = (2 – 1) \times (2 – 1) = 1

Step 4: Compare with the Chi-Square Critical Value
From the Chi-Square table, at df = 1 and α = 0.05, the critical value is 3.84.

Since 2.02 < 3.84, we fail to reject the null hypothesis, meaning gender and voting preference are independent.

If χ² > critical value, reject the null hypothesis → There is a relationship between variables.
If χ² < critical value, fail to reject the null hypothesis → Variables are independent.

Psychology: Relationship between stress and coping mechanisms.
Marketing: Brand preference across age groups.
Education: Student performance based on teaching methods.
Healthcare: Disease prevalence based on lifestyle factors.

अध्ययन घंटे (X)

रैंक (X)

परीक्षा अंक (Y)

रैंक (Y)

d=X−Yd = X – Y

The Chi-Square test is a crucial statistical tool for analyzing categorical data. The Goodness-of-Fit test checks if an observed distribution matches an expected one, while the Test for Independence determines whether two categorical variables are related.

UnNoticed Digital College March 2, 2025

0 23 1 hours read