importing header files¶

In [32]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

calling the excel file or csv file which I am going to use¶

In [33]:

df = pd.read_csv('students_score.csv')

printing the header files - here first five row will be printed¶

In [34]:

print(df.head())

   Unnamed: 0  Gender EthnicGroup          ParentEduc     LunchType TestPrep  \
0           0  female         NaN   bachelor's degree      standard     none   
1           1  female     group C        some college      standard      NaN   
2           2  female     group B     master's degree      standard     none   
3           3    male     group A  associate's degree  free/reduced     none   
4           4    male     group C        some college      standard     none   

  ParentMaritalStatus PracticeSport IsFirstChild  NrSiblings TransportMeans  \
0             married     regularly          yes         3.0     school_bus   
1             married     sometimes          yes         0.0            NaN   
2              single     sometimes          yes         4.0     school_bus   
3             married         never           no         1.0            NaN   
4             married     sometimes          yes         0.0     school_bus   

  WklyStudyHours  MathScore  ReadingScore  WritingScore  
0            < 5         71            71            74  
1         5 - 10         69            90            88  
2            < 5         87            93            91  
3         5 - 10         45            56            42  
4         5 - 10         76            78            75

printing to describe the files different values - like count, mean, min , max , standerd deviasion¶

In [35]:

print(df.describe())

         Unnamed: 0    NrSiblings     MathScore  ReadingScore  WritingScore
count  30641.000000  29069.000000  30641.000000  30641.000000  30641.000000
mean     499.556607      2.145894     66.558402     69.377533     68.418622
std      288.747894      1.458242     15.361616     14.758952     15.443525
min        0.000000      0.000000      0.000000     10.000000      4.000000
25%      249.000000      1.000000     56.000000     59.000000     58.000000
50%      500.000000      2.000000     67.000000     70.000000     69.000000
75%      750.000000      3.000000     78.000000     80.000000     79.000000
max      999.000000      7.000000    100.000000    100.000000    100.000000

printing the header files - here first 2 row will be printed¶

In [36]:

print(df.head(2))

   Unnamed: 0  Gender EthnicGroup         ParentEduc LunchType TestPrep  \
0           0  female         NaN  bachelor's degree  standard     none   
1           1  female     group C       some college  standard      NaN   

  ParentMaritalStatus PracticeSport IsFirstChild  NrSiblings TransportMeans  \
0             married     regularly          yes         3.0     school_bus   
1             married     sometimes          yes         0.0            NaN   

  WklyStudyHours  MathScore  ReadingScore  WritingScore  
0            < 5         71            71            74  
1         5 - 10         69            90            88

call the header files - here first 2 row will be printed but it will be a table form¶

In [37]:

(df.head(2))

Out[37]:

	Unnamed: 0	Gender	EthnicGroup	ParentEduc	LunchType	TestPrep	ParentMaritalStatus	PracticeSport	IsFirstChild	NrSiblings	TransportMeans	WklyStudyHours	MathScore	ReadingScore	WritingScore
0	0	female	NaN	bachelor's degree	standard	none	married	regularly	yes	3.0	school_bus	< 5	71	71	74
1	1	female	group C	some college	standard	NaN	married	sometimes	yes	0.0	NaN	5 - 10	69	90	88

call the last files - here last 2 row will be printed but it will be a table form¶

In [38]:

df.tail(2)

Out[38]:

	Unnamed: 0	Gender	EthnicGroup	ParentEduc	LunchType	TestPrep	ParentMaritalStatus	PracticeSport	IsFirstChild	NrSiblings	TransportMeans	WklyStudyHours	MathScore	ReadingScore	WritingScore
30639	934	female	group D	associate's degree	standard	completed	married	regularly	no	3.0	school_bus	5 - 10	82	90	93
30640	960	male	group B	some college	standard	none	married	never	no	1.0	school_bus	5 - 10	64	60	58

printing the header files - here first five row will be printed¶

In [39]:

print(df.tail())

       Unnamed: 0  Gender EthnicGroup          ParentEduc     LunchType  \
30636         816  female     group D         high school      standard   
30637         890    male     group E         high school      standard   
30638         911  female         NaN         high school  free/reduced   
30639         934  female     group D  associate's degree      standard   
30640         960    male     group B        some college      standard   

        TestPrep ParentMaritalStatus PracticeSport IsFirstChild  NrSiblings  \
30636       none              single     sometimes           no         2.0   
30637       none              single     regularly           no         1.0   
30638  completed             married     sometimes           no         1.0   
30639  completed             married     regularly           no         3.0   
30640       none             married         never           no         1.0   

      TransportMeans WklyStudyHours  MathScore  ReadingScore  WritingScore  
30636     school_bus         5 - 10         59            61            65  
30637        private         5 - 10         58            53            51  
30638        private         5 - 10         61            70            67  
30639     school_bus         5 - 10         82            90            93  
30640     school_bus         5 - 10         64            60            58

df.info() - mainly it used for finding the null values only¶

In [40]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30641 entries, 0 to 30640
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Unnamed: 0           30641 non-null  int64  
 1   Gender               30641 non-null  object 
 2   EthnicGroup          28801 non-null  object 
 3   ParentEduc           28796 non-null  object 
 4   LunchType            30641 non-null  object 
 5   TestPrep             28811 non-null  object 
 6   ParentMaritalStatus  29451 non-null  object 
 7   PracticeSport        30010 non-null  object 
 8   IsFirstChild         29737 non-null  object 
 9   NrSiblings           29069 non-null  float64
 10  TransportMeans       27507 non-null  object 
 11  WklyStudyHours       29686 non-null  object 
 12  MathScore            30641 non-null  int64  
 13  ReadingScore         30641 non-null  int64  
 14  WritingScore         30641 non-null  int64  
dtypes: float64(1), int64(4), object(10)
memory usage: 3.5+ MB

Give the sum of the null values of each columns if available¶

In [41]:

df.isnull().sum()

Out[41]:

Unnamed: 0                0
Gender                    0
EthnicGroup            1840
ParentEduc             1845
LunchType                 0
TestPrep               1830
ParentMaritalStatus    1190
PracticeSport           631
IsFirstChild            904
NrSiblings             1572
TransportMeans         3134
WklyStudyHours          955
MathScore                 0
ReadingScore              0
WritingScore              0
dtype: int64

I used this to drop the Unnamed column its mean delete it but there will no changes will done in the original current data¶

In [42]:

df = df.drop("Unnamed: 0",axis = 1)

In [43]:

df.head()

Out[43]:

	Gender	EthnicGroup	ParentEduc	LunchType	TestPrep	ParentMaritalStatus	PracticeSport	IsFirstChild	NrSiblings	TransportMeans	WklyStudyHours	MathScore	ReadingScore	WritingScore
0	female	NaN	bachelor's degree	standard	none	married	regularly	yes	3.0	school_bus	< 5	71	71	74
1	female	group C	some college	standard	NaN	married	sometimes	yes	0.0	NaN	5 - 10	69	90	88
2	female	group B	master's degree	standard	none	single	sometimes	yes	4.0	school_bus	< 5	87	93	91
3	male	group A	associate's degree	free/reduced	none	married	never	no	1.0	NaN	5 - 10	45	56	42
4	male	group C	some college	standard	none	married	sometimes	yes	0.0	school_bus	5 - 10	76	78	75

I do some changes here in my data for my research benifits change WklyStudyHours "5 - 10" = "> 5" .¶

In [44]:

df["WklyStudyHours"] = df["WklyStudyHours"].str.replace("5 - 10","> 5")

In [45]:

df.head()

Out[45]:

	Gender	EthnicGroup	ParentEduc	LunchType	TestPrep	ParentMaritalStatus	PracticeSport	IsFirstChild	NrSiblings	TransportMeans	WklyStudyHours	MathScore	ReadingScore	WritingScore
0	female	NaN	bachelor's degree	standard	none	married	regularly	yes	3.0	school_bus	< 5	71	71	74
1	female	group C	some college	standard	NaN	married	sometimes	yes	0.0	NaN	> 5	69	90	88
2	female	group B	master's degree	standard	none	single	sometimes	yes	4.0	school_bus	< 5	87	93	91
3	male	group A	associate's degree	free/reduced	none	married	never	no	1.0	NaN	> 5	45	56	42
4	male	group C	some college	standard	none	married	sometimes	yes	0.0	school_bus	> 5	76	78	75

show the results ParentMaritalStatus in bar chart vstyle¶

In [46]:

# sns.countplot(data = df , x = 'Gender' , y = 'ParentMaritalStatus')
a = sns.countplot(data = df , x = 'ParentMaritalStatus')
a.bar_label(a.containers[0])
plt.show()

No description has been provided for this image

show the results TransportMeans in bar chart style¶

In [47]:

b = sns.countplot(data = df , x = 'TransportMeans')
b.bar_label(a.containers[0])
plt.show()

show the results according to math score in bar chart vstyle¶

In [48]:

plt.figure (figsize = (24,8))
c = sns.countplot(data = df , x = 'MathScore')
c.bar_label(a.containers[0])
plt.show()

show the mean value of MathScore , ReadingScore, WritingScore based on parents education¶

In [49]:

gb = df.groupby("ParentEduc").agg({"MathScore":'mean',"ReadingScore":'mean',"WritingScore":'mean'})
gb

Out[49]:

	MathScore	ReadingScore	WritingScore
ParentEduc
associate's degree	68.365586	71.124324	70.299099
bachelor's degree	70.466627	73.062020	73.331069
high school	64.435731	67.213997	65.421136
master's degree	72.336134	75.832921	76.356896
some college	66.390472	69.179708	68.501432
some high school	62.584013	65.510785	63.632409

In [50]:

print(gb)

                    MathScore  ReadingScore  WritingScore
ParentEduc                                               
associate's degree  68.365586     71.124324     70.299099
bachelor's degree   70.466627     73.062020     73.331069
high school         64.435731     67.213997     65.421136
master's degree     72.336134     75.832921     76.356896
some college        66.390472     69.179708     68.501432
some high school    62.584013     65.510785     63.632409

show the Heatmap of MathScore , ReadingScore, WritingScore based on parents education¶

In [51]:

plt.figure(figsize=(8,6))
sns.heatmap(gb,annot= True)
plt.show()

In [52]:

plt.figure(figsize=(4,4))
sns.heatmap(gb,cmap="BuPu",annot= True)
plt.show()

show the mean value and heatmap of MathScore , ReadingScore, WritingScore based on parents marital status¶

In [53]:

gb1 = df.groupby("ParentMaritalStatus").agg({"MathScore":'mean',"ReadingScore":'mean',"WritingScore":'mean'})

In [54]:

gb1

Out[54]:

	MathScore	ReadingScore	WritingScore
ParentMaritalStatus
divorced	66.691197	69.655011	68.799146
married	66.657326	69.389575	68.420981
single	66.165704	69.157250	68.174440
widowed	67.368866	69.651438	68.563452

In [55]:

plt.figure(figsize=(4,4))
sns.heatmap(gb1,cmap="viridis",annot= True)
plt.show()

In [77]:

# show the mean value and heatmap of MathScore , ReadingScore, WritingScore based on No of Siblings

In [78]:

gb2 = df.groupby("NrSiblings").agg({"MathScore":'mean',"ReadingScore":'mean',"WritingScore":'mean'})

In [79]:

gb2

Out[79]:

	MathScore	ReadingScore	WritingScore
NrSiblings
0.0	66.819449	69.547812	68.746515
1.0	66.473896	69.259097	68.245345
2.0	66.554934	69.472018	68.522533
3.0	66.719092	69.488159	68.650498
4.0	66.245495	69.144169	68.073444
5.0	66.630303	69.453788	68.282576
6.0	65.917219	68.801325	67.860927
7.0	67.615120	69.828179	68.986254

In [80]:

plt.figure(figsize=(4,4))
sns.heatmap(gb2,cmap="viridis",annot= True)
plt.show()

In [81]:

df.head(2)

Out[81]:

	Gender	EthnicGroup	ParentEduc	LunchType	TestPrep	ParentMaritalStatus	PracticeSport	IsFirstChild	NrSiblings	TransportMeans	WklyStudyHours	MathScore	ReadingScore	WritingScore
0	female	NaN	bachelor's degree	standard	none	married	regularly	yes	3.0	school_bus	< 5	71	71	74
1	female	group C	some college	standard	NaN	married	sometimes	yes	0.0	NaN	> 5	69	90	88

show the mean value and heatmap of MathScore , ReadingScore, WritingScore based on Is First chield or not & Weekly Study Hours¶

In [87]:

gb3 = df.groupby("IsFirstChild").agg({"MathScore":'mean',"ReadingScore":'mean',"WritingScore":'mean'})

In [88]:

gb4 = df.groupby("WklyStudyHours").agg({"MathScore":'mean',"ReadingScore":'mean',"WritingScore":'mean'})
gb4

Out[88]:

	MathScore	ReadingScore	WritingScore
WklyStudyHours
< 5	64.580359	68.176135	67.090192
> 10	68.696655	70.365436	69.777778
> 5	66.870491	69.660532	68.636280

In [89]:

gb3

Out[89]:

	MathScore	ReadingScore	WritingScore
IsFirstChild
no	66.246832	69.132614	68.210887
yes	66.740646	69.542553	68.558484

In [90]:

sns.heatmap(gb3,cmap="viridis",annot=True)
# plt.figure(figsize=(4,4))
# sns.heatmap(gb2,cmap="viridis",annot= True)
# plt.show()

Out[90]:

<Axes: ylabel='IsFirstChild'>

In [64]:

sns.heatmap(gb4,cmap="viridis",annot=True)
# plt.figure(figsize=(4,4))
# sns.heatmap(gb2,cmap="viridis",annot= True)
# plt.show()

Out[64]:

<Axes: ylabel='WklyStudyHours'>

Boxplot Based on weekly study hours¶

In [91]:

sns.boxplot(data = df, x = "WklyStudyHours")
plt.show()

Boxplot Based on Writing Score¶

In [92]:

sns.boxplot(data = df, x = "WritingScore")
plt.show()

Boxplot Based on Math Score¶

In [93]:

sns.boxplot(data = df, x = "MathScore")
plt.show()

Boxplot Based on Reading Score¶

In [68]:

sns.boxplot(data = df, x = "ReadingScore")
plt.show()

In [69]:

print(df["EthnicGroup"].unique())

[nan 'group C' 'group B' 'group A' 'group D' 'group E']

count every column value where EthnicGroup == group A¶

In [70]:

groupA = df.loc[(df["EthnicGroup"] == "group A")].count()
print(groupA)

Gender                 2219
EthnicGroup            2219
ParentEduc             2078
LunchType              2219
TestPrep               2081
ParentMaritalStatus    2121
PracticeSport          2167
IsFirstChild           2168
NrSiblings             2096
TransportMeans         1999
WklyStudyHours         2146
MathScore              2219
ReadingScore           2219
WritingScore           2219
dtype: int64

creating PI chart and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also ( Using only Integer Values)¶

In [100]:

groupA = df.loc[(df["EthnicGroup"] == "group A")].count()
groupB = df.loc[(df["EthnicGroup"] == "group B")].count()
groupC = df.loc[(df["EthnicGroup"] == "group C")].count()
groupD = df.loc[(df["EthnicGroup"] == "group D")].count()
groupE = df.loc[(df["EthnicGroup"] == "group E")].count()
mlist = [groupA["EthnicGroup"],groupB["EthnicGroup"] ,groupC["EthnicGroup"],groupD["EthnicGroup"] ,groupE["EthnicGroup"]]
l=['groupA','groupB','groupC','groupD','groupE']

# plt.pie(mlist, labels=l, autopct = "%1.2f%%")
print(mlist)
plt.pie(mlist, labels=l, autopct = "%1i%%")
plt.show()

[np.int64(2219), np.int64(5826), np.int64(9212), np.int64(7503), np.int64(4041)]

creating PI chart and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also ( Using Float Values here)¶

In [102]:

plt.pie(mlist, labels=l, autopct = "%1.2f%%")
plt.title("Distribution of Ethnic Group \n ")
plt.show()

creating Bar Plot and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also¶

In [105]:

l = sns.countplot(data = df ,x='EthnicGroup')
l.bar_label(l.containers[0])

Out[105]:

[Text(0, 0, '9212'),
 Text(0, 0, '5826'),
 Text(0, 0, '2219'),
 Text(0, 0, '7503'),
 Text(0, 0, '4041')]

Student Score Analysis and Visualization using Python | Sontu Ball

importing header files¶

calling the excel file or csv file which I am going to use¶

printing the header files - here first five row will be printed¶

printing to describe the files different values - like count, mean, min , max , standerd deviasion¶

printing the header files - here first 2 row will be printed¶

call the header files - here first 2 row will be printed but it will be a table form¶

call the last files - here last 2 row will be printed but it will be a table form¶

printing the header files - here first five row will be printed¶

df.info() - mainly it used for finding the null values only¶

Give the sum of the null values of each columns if available¶

I used this to drop the Unnamed column its mean delete it but there will no changes will done in the original current data¶

I do some changes here in my data for my research benifits change WklyStudyHours "5 - 10" = "> 5" .¶

show the results ParentMaritalStatus in bar chart vstyle¶

show the results TransportMeans in bar chart style¶

show the results according to math score in bar chart vstyle¶

show the mean value of MathScore , ReadingScore, WritingScore based on parents education¶

show the Heatmap of MathScore , ReadingScore, WritingScore based on parents education¶

show the mean value and heatmap of MathScore , ReadingScore, WritingScore based on parents marital status¶

show the mean value and heatmap of MathScore , ReadingScore, WritingScore based on Is First chield or not & Weekly Study Hours¶

Boxplot Based on weekly study hours¶

Boxplot Based on Writing Score¶

Boxplot Based on Math Score¶

Boxplot Based on Reading Score¶

count every column value where EthnicGroup == group A¶

creating PI chart and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also ( Using only Integer Values)¶

creating PI chart and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also ( Using Float Values here)¶

creating Bar Plot and showing the value where EthnicGroup == group A, group B, group C, group D, group E and showing its persentages also¶

Related Articles

0 Comments

About Me

SUBSCRIBE & FOLLOW

POPULAR POSTS

Advertisement

Report Abuse

About Me

About Us

.sontu.

Data Visualization Graphs

About Blog

Popular Posts

Python Important Shortcut Notes For Beginners - Python Last Minute Notes

Calculator

Labels