Data Science(Programming Slips)

 *NOTE-Import Libraries where ever it is Necessary.

*NOTE-You can plt.show() where ever it is requried.

 

SLIP 1

A) Write a Python program to create a Pie plot to get the frequency of the three species of 

the Iris data (Use iris.csv)

import pandas as pd

import numpy as np

import scipy.stats as sc

import matplotlib.pyplot as plt


df=pd.read_csv("iris.csv")

df


ax=plt.subplots(1,1,figsize=(10,8))

df['class'].value_counts().plot.pie(explode=[0.1,0.1,0.1],autopct='%1.1f%%',shadow=True,figsize=(10,8))

plt.title("Iris Species ")

plt.show()


B)Write a Python program to view basic statistical details of the data.(Use wineequality-red.csv)

df=pd.read_csv("wineequality-red.csv")

df


df.describe() or print(df.describe())

...............................................

SLIP 2

A)Write a Python program for Handling Missing Value. Replace missing value of salary, 

age column with mean of that column.(Use Data.csv file). 

df=pd.read_csv("Data.csv")

df


df['Age'].fillna(df['Age'].mean())

df['Salary'].fillna(df['Salary'].mean())


B)Write a Python program to generate a line plot of name Vs salary

plt.plot(df.name,df.salary)

plt.show()


C)Download the heights and weights dataset and load the dataset froma given csv file into a 

dataframe. Print the first, last 10 rows and random 20 rows also display shape of the 

dataset. 

//df=pd.read_csv("HeightWeight.csv")

HeightWeight={'Height':[1,2,3,4,6,8,3,6,0,2,6,8,......],'Weight':[5,9,3,0,5,.....]}

df=pd.DataFrame(HeightWeight)


df.head(10)

df.tail(10)

df.rand(20)

df.shape()

...................................................................

SLIP 3

A))Write a Python program to create box plots to see how each feature i.e. Sepal Length, 

Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use 

iris.csv dataset) 

df=pd.read_csv("iris.csv")

df


plt.boxplot(df)


B)Write a Python program to view basic statistical details of the data (Use Heights and 

Weights Dataset) 

df=pd.read_csv("HeightWeight.csv")

df


df.describe() or print(df.describe())

..........................................................................

SLIP 4

A)Generate a random array of 50 integers and display them using a line chart, scatter 

plot, histogram and box plot. Apply appropriate color, labels and styling options.

df=np.random.rand(50)

df


plt.plot(df)


x=np.random.rand(25)

y=np.random.rand(25)

plt.scatter(x,y)


plt.hist(df,facecolor='y',linewidth=2,edgecolor='k')


plt.boxplot(df)


B)Write a Python program to print the shape, number of rows-columns, data types, 

feature names and the description of the data(Use User_Data.csv)

df=pd.read_csv("User_Data.csv")

df


df.shape


len(df.axes[0])


len(df.axes[1])


df.dtypes


df.describe()

...............................................

SLIP 5)


(SAME AS SLIP 4)


SLIP 6)


(SAME AS SLIP 2)

.................................................

SLIP 7)

A)Write a Python program to perform the following tasks : 

a. Apply OneHot coding on Country column. 

b. Apply Label encoding on purchased column 

(Data.csv have two categorical column the country column, and the purchased column). 

Solution:

from sklearn.preprocessing import OneHotEncoder

enc=OneHotEncoder(handle_unknown='ignore')

enc_df=pd.DataFrame(enc.fit_transform(df[['Country']]).toarray())

enc_df


from sklearn.preprocessing import LabelEncoder

le=LabelEncoder()

df['Purchased']=le.fit_transform(df['Purchased'])

df

......................................................

SLIP 8

Q)Write a program in python to perform following task : [15]

Standardizing Data (transform them into a standard Gaussian distribution with a mean 

of 0 and a standard deviation of 1) (Use winequality-red.csv

#Creating a DataFrame

d = {'C01':[1,3,7,4],'C02':[12,2,7,1],'C03':[22,34,-11,9]}

df2 = pd.DataFrame(d)

print("\n ORIGINAL DATA VALUES")

print("------------------------")

print(df2)


#Method 4: Standardizing Data

print("\n Standardizing Data ")

print("----------------------")

X_train = np.array([[ 1., -1., 2.],[ 2., 0., 0.],[ 0., 1., -1.]])

print(" Orginal Data \n", X_train)

print("\n Initial Mean : ", s.tmean(X_train).round(2))

print(" Initial Standard Deviation : ",round(X_train.std(),2))

X_scaled = preprocessing.scale(X_train)

X_scaled.mean(axis=0)

X_scaled.std(axis=0)

print("\n Standardized Data \n", X_scaled.round(2))

print("\n Scaled Mean : ",s.tmean(X_scaled).round(2))

print(" Scaled Standard Deviation : ",round(X_scaled.std(),2))


...........................................................................

SLIP 9

A)

(SAME AS SLIP 4)

B)Create two lists, one representing subject names and the other representing marks 

obtained in those subjects. Display the data in a pie chart.


sub_name=["Ds","Python","c","java"]

sub_marks=[78,67,87,67]

plt.pie(sub_marks,labels=sub_name)


C)Write a program in python to perform following task (Use winequality-red.csv ) [5]

Import Dataset and do the followings: 

a) Describing the dataset 

b) Shape of the dataset 

c) Display first 3 rows from dataset


df=pd.read_csv("wineequality-red.csv")

df


df.describe()


df.shape


df.head(3)


..............................................................

A)Write a python program to Display column-wise mean, and median for SOCRHeightWeight dataset.


df=pd.read_csv("HeightWeight.csv")

df


df["Height"].mean()


df["Weight"].median()

B)Write a python program to compute sum of Manhattan distance between all pairs of 

points.


from scipy.spatial.distance import cityblock

import pandas as pd


#define DataFrame

df=pd.DataFrame({'A':[2,4,4,6],'B':[5,5,7,8],'C':[9,12,12,13]})


#calculate Manhattan distance between columns A and B

cityblock(df.A,df.B)



..........................................

SLIP 11

(SAME AS SLIP 1)


SLIP 12

A)

(SAME AS SLIP 4)

B)Write a Python program to create data frame containing column name, salary, department 

add 10 rows with some missing and duplicate values to the data frame. Also drop all null and 

empty values. Print the modified data frame. 


df=pd.DataFrame(columns=['name','salary','department'])

df.loc[0]=['Bharat',20000,'Sales']

df.loc[1]=['Mitali',61000,'Purchase']

df.loc[2]=['Sakshi',61000,'Account']

df.loc[3]=['Aditya',90000,'Sales']

df.loc[4]=['Rahul',None,'ABC']

df.loc[5]=['Ganesh',11000,None]

df.loc[6]=['Siddhi',20000,'DEF']

df.loc[7]=['Tanu',None,None]

df.loc[8]=['Priya',45000,'XYZ']

df.loc[9]=['Vrushali',50000,'Purchase']


df.dropna()

df


..............................................................

SLIP 13

A)Write a Python program to create a graph to find relationship between the petal length 

and petal width.(Use iris.csv dataset) 


from sklearn import preprocessing

import seaborn as sns


iris=pd.read_csv("iris.csv")

print(iris.head())


le=preprocessing.LabelEncoder()

iris.species=le.fit_transform(iris.species)

print(iris.head())


sns.scatterplot(data=iris,x='petal_length',y='petal_width',hue='species')

plt.plot()



B)Write a Python program to find the maximum and minimum value of a given flattened 

 array

x=[[0,1],[2,3]]

np.max(x)

np.min(x)


...................................................................

SLIP 14

A) Write a Python NumPy program to compute the weighted average along the specified 

axis of a given flattened array.


import numpy as np

x=np.arange(5)

print(x)

weights=np.arange(1,6)

r1=np.average(x,weights=weights)

print(r1)


B)Write a Python program to view basic statistical details of the data (Use advertising.csv)


df=pd.read_csv("advertising.csv")

df


df.describe() or print(df.describe())


.................................................................

SLIP 15

A)

(SAME AS SLIP 4)


B)

(SAME AS SLIP 9)


........................................................................

SLIP 16

A)

(SAME AS SLIP 9)

plt.bar(sub_name,sub_marks)

B)Write a python program to create a data frame for students’ information such as name, 

graduation percentage and age. Display average age of students, average of graduation 

percentage. 


import pandas as pd

import numpy as np

student  = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

            'graduation percentage': [80,70,89,55,80,66,77,55,45,88],

            'age': [21, 33, 22, 23, 22, 13, 19, 17, 20, 19]}

df = pd.DataFrame(student)

print("\nMean age for each different student in data frame:")

print(df['age'].mean())

print("\nMean percentage for each different student in data frame:")

print(df['graduation percentage'].mean())


.............................................................................


SLIP 17

A)

(SAME AS SLIP 13)


B)

(SAME AS SLIP 12)


SLIP 18

A)Write a Python program to create box plots to see how each feature i.e. Sepal Length, 

Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv 

dataset)

df=pd.read_csv("iris.csv")

df


plt.boxplot(df)


B)

(SAME AS SLIP 2)


SLIP 19

A)

(REFER SLIP 12)

B)To print the shape, number of rows-columns, data types, feature names and the description of 

the data


list(df.columns)


C)To Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty 

values. Display the data.


duplicate=df[df.duplicated()]

print("Duplicate Rows")

duplicate


df.isnull()


df["Remark"]=None

print("DataFrame after addding the Remark column")

df



...............................................................

SLIP 20


A) 

(Same As Slip 12)

B)Add two outliers to the above data and display the box plot.

plt.boxplot(df)

fig=plt.figure(figsize=(10,7))

plt.show()



....................................................

SLIP 21 & 24

A)

df=pd.read_csv("iris.csv")

df


import seaborn as sns

ax=plt.subplots(1,1,figsize=(10,8))

sns.countplot('class',data=df)

plt.title("Iris Species Count")

plt.show()


x=df["sepallength"]


plt.hist(x,bins=20,color="aqua")

plt.title("Sepal length in cm")

plt.xlabel("Sepal_length_cm")

plt.ylabel("Count")


.........................................................

SLIP 22 & 23


#Creating a DataFrame

d = {'C01':[1,3,7,4],'C02':[12,2,7,1],'C03':[22,34,-11,9]}

df2 = pd.DataFrame(d)

print("\n ORIGINAL DATA VALUES")

print("------------------------")

print(df2)


from sklearn import preprocessing

#Method 1: Rescaling Data

print("\n\n Data Scaled Between 0 to 1")

data_scaler = preprocessing.MinMaxScaler(feature_range = (0, 1))

data_scaled = data_scaler.fit_transform(df2)

print("\n Min Max Scaled Data ")

print("-----------------------")

print(data_scaled.round(2))


import scipy.stats as s

#Method 2: Normalization rescales such that sum of each row is 1.

dn = preprocessing.normalize(df2, norm = 'l1')

print("\n L1 Normalized Data ")

print(" ----------------------")

print(dn.round(2))


#Method 3: Binarize Data (Make Binary)

data_binarized = preprocessing.Binarizer(threshold=5).transform(df2)

print("\n Binarized data ")

print(" -----------------")

print(data_binarized)


#Method 4: Standardizing Data

print("\n Standardizing Data ")

print("----------------------")

X_train = np.array([[ 1., -1., 2.],[ 2., 0., 0.],[ 0., 1., -1.]])

print(" Orginal Data \n", X_train)

print("\n Initial Mean : ", s.tmean(X_train).round(2))

print(" Initial Standard Deviation : ",round(X_train.std(),2))

X_scaled = preprocessing.scale(X_train)

X_scaled.mean(axis=0)

X_scaled.std(axis=0)

print("\n Standardized Data \n", X_scaled.round(2))

print("\n Scaled Mean : ",s.tmean(X_scaled).round(2))

print(" Scaled Standard Deviation : ",round(X_scaled.std(),2))


Thanks!!!


Post a Comment

0 Comments