*NOTE-Import Libraries where ever it is Necessary.
*NOTE-You can plt.show() where ever it is requried.
SLIP 1
A) Write a Python program to create a Pie plot to get the frequency of the three species of
the Iris data (Use iris.csv)
import pandas as pd
import numpy as np
import scipy.stats as sc
import matplotlib.pyplot as plt
df=pd.read_csv("iris.csv")
df
ax=plt.subplots(1,1,figsize=(10,8))
df['class'].value_counts().plot.pie(explode=[0.1,0.1,0.1],autopct='%1.1f%%',shadow=True,figsize=(10,8))
plt.title("Iris Species ")
plt.show()
B)Write a Python program to view basic statistical details of the data.(Use wineequality-red.csv)
df=pd.read_csv("wineequality-red.csv")
df
df.describe() or print(df.describe())
...............................................
SLIP 2
A)Write a Python program for Handling Missing Value. Replace missing value of salary,
age column with mean of that column.(Use Data.csv file).
df=pd.read_csv("Data.csv")
df
df['Age'].fillna(df['Age'].mean())
df['Salary'].fillna(df['Salary'].mean())
B)Write a Python program to generate a line plot of name Vs salary
plt.plot(df.name,df.salary)
plt.show()
C)Download the heights and weights dataset and load the dataset froma given csv file into a
dataframe. Print the first, last 10 rows and random 20 rows also display shape of the
dataset.
//df=pd.read_csv("HeightWeight.csv")
HeightWeight={'Height':[1,2,3,4,6,8,3,6,0,2,6,8,......],'Weight':[5,9,3,0,5,.....]}
df=pd.DataFrame(HeightWeight)
df.head(10)
df.tail(10)
df.rand(20)
df.shape()
...................................................................
SLIP 3
A))Write a Python program to create box plots to see how each feature i.e. Sepal Length,
Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use
iris.csv dataset)
df=pd.read_csv("iris.csv")
df
plt.boxplot(df)
B)Write a Python program to view basic statistical details of the data (Use Heights and
Weights Dataset)
df=pd.read_csv("HeightWeight.csv")
df
df.describe() or print(df.describe())
..........................................................................
SLIP 4
A)Generate a random array of 50 integers and display them using a line chart, scatter
plot, histogram and box plot. Apply appropriate color, labels and styling options.
df=np.random.rand(50)
df
plt.plot(df)
x=np.random.rand(25)
y=np.random.rand(25)
plt.scatter(x,y)
plt.hist(df,facecolor='y',linewidth=2,edgecolor='k')
plt.boxplot(df)
B)Write a Python program to print the shape, number of rows-columns, data types,
feature names and the description of the data(Use User_Data.csv)
df=pd.read_csv("User_Data.csv")
df
df.shape
len(df.axes[0])
len(df.axes[1])
df.dtypes
df.describe()
...............................................
SLIP 5)
(SAME AS SLIP 4)
SLIP 6)
(SAME AS SLIP 2)
.................................................
SLIP 7)
A)Write a Python program to perform the following tasks :
a. Apply OneHot coding on Country column.
b. Apply Label encoding on purchased column
(Data.csv have two categorical column the country column, and the purchased column).
Solution:
from sklearn.preprocessing import OneHotEncoder
enc=OneHotEncoder(handle_unknown='ignore')
enc_df=pd.DataFrame(enc.fit_transform(df[['Country']]).toarray())
enc_df
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
df['Purchased']=le.fit_transform(df['Purchased'])
df
......................................................
SLIP 8
Q)Write a program in python to perform following task : [15]
Standardizing Data (transform them into a standard Gaussian distribution with a mean
of 0 and a standard deviation of 1) (Use winequality-red.csv
#Creating a DataFrame
d = {'C01':[1,3,7,4],'C02':[12,2,7,1],'C03':[22,34,-11,9]}
df2 = pd.DataFrame(d)
print("\n ORIGINAL DATA VALUES")
print("------------------------")
print(df2)
#Method 4: Standardizing Data
print("\n Standardizing Data ")
print("----------------------")
X_train = np.array([[ 1., -1., 2.],[ 2., 0., 0.],[ 0., 1., -1.]])
print(" Orginal Data \n", X_train)
print("\n Initial Mean : ", s.tmean(X_train).round(2))
print(" Initial Standard Deviation : ",round(X_train.std(),2))
X_scaled = preprocessing.scale(X_train)
X_scaled.mean(axis=0)
X_scaled.std(axis=0)
print("\n Standardized Data \n", X_scaled.round(2))
print("\n Scaled Mean : ",s.tmean(X_scaled).round(2))
print(" Scaled Standard Deviation : ",round(X_scaled.std(),2))
...........................................................................
SLIP 9
A)
(SAME AS SLIP 4)
B)Create two lists, one representing subject names and the other representing marks
obtained in those subjects. Display the data in a pie chart.
sub_name=["Ds","Python","c","java"]
sub_marks=[78,67,87,67]
plt.pie(sub_marks,labels=sub_name)
C)Write a program in python to perform following task (Use winequality-red.csv ) [5]
Import Dataset and do the followings:
a) Describing the dataset
b) Shape of the dataset
c) Display first 3 rows from dataset
df=pd.read_csv("wineequality-red.csv")
df
df.describe()
df.shape
df.head(3)
..............................................................
A)Write a python program to Display column-wise mean, and median for SOCRHeightWeight dataset.
df=pd.read_csv("HeightWeight.csv")
df
df["Height"].mean()
df["Weight"].median()
B)Write a python program to compute sum of Manhattan distance between all pairs of
points.
from scipy.spatial.distance import cityblock
import pandas as pd
#define DataFrame
df=pd.DataFrame({'A':[2,4,4,6],'B':[5,5,7,8],'C':[9,12,12,13]})
#calculate Manhattan distance between columns A and B
cityblock(df.A,df.B)
..........................................
SLIP 11
(SAME AS SLIP 1)
SLIP 12
A)
(SAME AS SLIP 4)
B)Write a Python program to create data frame containing column name, salary, department
add 10 rows with some missing and duplicate values to the data frame. Also drop all null and
empty values. Print the modified data frame.
df=pd.DataFrame(columns=['name','salary','department'])
df.loc[0]=['Bharat',20000,'Sales']
df.loc[1]=['Mitali',61000,'Purchase']
df.loc[2]=['Sakshi',61000,'Account']
df.loc[3]=['Aditya',90000,'Sales']
df.loc[4]=['Rahul',None,'ABC']
df.loc[5]=['Ganesh',11000,None]
df.loc[6]=['Siddhi',20000,'DEF']
df.loc[7]=['Tanu',None,None]
df.loc[8]=['Priya',45000,'XYZ']
df.loc[9]=['Vrushali',50000,'Purchase']
df.dropna()
df
..............................................................
SLIP 13
A)Write a Python program to create a graph to find relationship between the petal length
and petal width.(Use iris.csv dataset)
from sklearn import preprocessing
import seaborn as sns
iris=pd.read_csv("iris.csv")
print(iris.head())
le=preprocessing.LabelEncoder()
iris.species=le.fit_transform(iris.species)
print(iris.head())
sns.scatterplot(data=iris,x='petal_length',y='petal_width',hue='species')
plt.plot()
B)Write a Python program to find the maximum and minimum value of a given flattened
array
x=[[0,1],[2,3]]
np.max(x)
np.min(x)
...................................................................
SLIP 14
A) Write a Python NumPy program to compute the weighted average along the specified
axis of a given flattened array.
import numpy as np
x=np.arange(5)
print(x)
weights=np.arange(1,6)
r1=np.average(x,weights=weights)
print(r1)
B)Write a Python program to view basic statistical details of the data (Use advertising.csv)
df=pd.read_csv("advertising.csv")
df
df.describe() or print(df.describe())
.................................................................
SLIP 15
A)
(SAME AS SLIP 4)
B)
(SAME AS SLIP 9)
........................................................................
SLIP 16
A)
(SAME AS SLIP 9)
plt.bar(sub_name,sub_marks)
B)Write a python program to create a data frame for students’ information such as name,
graduation percentage and age. Display average age of students, average of graduation
percentage.
import pandas as pd
import numpy as np
student = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'graduation percentage': [80,70,89,55,80,66,77,55,45,88],
'age': [21, 33, 22, 23, 22, 13, 19, 17, 20, 19]}
df = pd.DataFrame(student)
print("\nMean age for each different student in data frame:")
print(df['age'].mean())
print("\nMean percentage for each different student in data frame:")
print(df['graduation percentage'].mean())
.............................................................................
SLIP 17
A)
(SAME AS SLIP 13)
B)
(SAME AS SLIP 12)
SLIP 18
A)Write a Python program to create box plots to see how each feature i.e. Sepal Length,
Sepal Width, Petal Length, Petal Width are distributed across the three species. (Use iris.csv
dataset)
df=pd.read_csv("iris.csv")
df
plt.boxplot(df)
B)
(SAME AS SLIP 2)
SLIP 19
A)
(REFER SLIP 12)
B)To print the shape, number of rows-columns, data types, feature names and the description of
the data
list(df.columns)
C)To Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty
values. Display the data.
duplicate=df[df.duplicated()]
print("Duplicate Rows")
duplicate
df.isnull()
df["Remark"]=None
print("DataFrame after addding the Remark column")
df
...............................................................
SLIP 20
A)
(Same As Slip 12)
B)Add two outliers to the above data and display the box plot.
plt.boxplot(df)
fig=plt.figure(figsize=(10,7))
plt.show()
....................................................
SLIP 21 & 24
A)
df=pd.read_csv("iris.csv")
df
import seaborn as sns
ax=plt.subplots(1,1,figsize=(10,8))
sns.countplot('class',data=df)
plt.title("Iris Species Count")
plt.show()
x=df["sepallength"]
plt.hist(x,bins=20,color="aqua")
plt.title("Sepal length in cm")
plt.xlabel("Sepal_length_cm")
plt.ylabel("Count")
.........................................................
SLIP 22 & 23
#Creating a DataFrame
d = {'C01':[1,3,7,4],'C02':[12,2,7,1],'C03':[22,34,-11,9]}
df2 = pd.DataFrame(d)
print("\n ORIGINAL DATA VALUES")
print("------------------------")
print(df2)
from sklearn import preprocessing
#Method 1: Rescaling Data
print("\n\n Data Scaled Between 0 to 1")
data_scaler = preprocessing.MinMaxScaler(feature_range = (0, 1))
data_scaled = data_scaler.fit_transform(df2)
print("\n Min Max Scaled Data ")
print("-----------------------")
print(data_scaled.round(2))
import scipy.stats as s
#Method 2: Normalization rescales such that sum of each row is 1.
dn = preprocessing.normalize(df2, norm = 'l1')
print("\n L1 Normalized Data ")
print(" ----------------------")
print(dn.round(2))
#Method 3: Binarize Data (Make Binary)
data_binarized = preprocessing.Binarizer(threshold=5).transform(df2)
print("\n Binarized data ")
print(" -----------------")
print(data_binarized)
#Method 4: Standardizing Data
print("\n Standardizing Data ")
print("----------------------")
X_train = np.array([[ 1., -1., 2.],[ 2., 0., 0.],[ 0., 1., -1.]])
print(" Orginal Data \n", X_train)
print("\n Initial Mean : ", s.tmean(X_train).round(2))
print(" Initial Standard Deviation : ",round(X_train.std(),2))
X_scaled = preprocessing.scale(X_train)
X_scaled.mean(axis=0)
X_scaled.std(axis=0)
print("\n Standardized Data \n", X_scaled.round(2))
print("\n Scaled Mean : ",s.tmean(X_scaled).round(2))
print(" Scaled Standard Deviation : ",round(X_scaled.std(),2))
Thanks!!!
0 Comments