新加坡MSBA 20/21就讀, 希望能在6月找到正職,7月順利入職. Target 職缺: BA/ DA /DS 主要還是看各公司的job scope 內容, 希望多一點business 含量
只前面試DS intern的時候,有被要求當場create a RMSE function, SQL query, 以及詢問ML 相關問題
2. python 練習:
pandas:
3. ML 相關問題:
a. what is the difference between Kmeans and KNN?
b. does random forest need cross-validation ?
c. how does the decision tree split?
d. what is neural network? how does it work? how to choose the weight?
e. what is the difference between random forest and gradient boosting?
2. df.price.hist(bins=10,by=df.room_type)
劃出不同room_type 的價格分布圖
3. df[[“column1”,”column2"]].dropna().plot()
可以將兩個columns 畫一起
根據room type 展現 average price
2. sns.barplot(data=df.loc[df.minimum_nights<7],y=”price”,hue=”room_type”,x=”minimum_nights”)
根據minimum_nights 及不同的 room_type 展現avg price

Gaussian Kernel Density Estimate
sns.distplot(data,hist=False,rug=True,kde_kws={“shade”:True}
2. sns.kdeplot(data, bw=10,shade=Ture)
bw=binwidth
3. sns.regplot(X=”column1",Y=”column2",hue=”column3",data=df)
regplot function generate a scatterplot with a regression line
Pandas data cleaning
a. df[‘Test Score’] = df[‘Test Score’].fillna(df[‘Test Score’].interpolate())
interpolate 以遺失值的前後兩者平均數填補遺失值
b. df[“score”]=df[“score”].fillna(df[“score”].mean()) 以平均值填補
c. df=df.fillna(method=”pad”) 以前值填補遺失值
2. non-standard missing value
isnull() function only picks up ‘Nan’ , will not other pick other types of missing values such as a dash(‘-‘) ,blank or even ‘na’
df=df.replace([“-”,” “,”na”],np.nan)
3. creating dataframe from list or scratch
a. df=pd.Dataframe
Months=[“jan”,”feb”,”mar”,”may”]
Days=[1,2,3,5]
df[“month”]=Months
df[“Days”]=Days
4. creating new colnames for dataframe
df.rename(columns={‘id_x’:’purchase_id’, ‘id_y’:’customer_id’,’id’:’product_id’})
5. count 資料轉換是否有錯誤
print(pd.to_datetime(combinedData[‘purch_date’], errors=’coerce’).isnull().value_counts())
print(pd.to_datetime(combinedData['purch_date'], errors='coerce'))errors=”coerce” …replace error row with np.NAN, if we want to drop the error, we can use dropna afterwards.
errors=”raise” … show the value that error occurs
R coding—

2. names(x) — extract the name of object
3. cumsum function — Returns a vector whose elements are the cumulative sums, products, minima or maxima of the elements of the argument.
cumsum(x) / cumprod(x) / cummax(x) / cummin(x)
cumsum(!is.na(x)) — return the add on T or F of x
x<-c(1, 2, 0, NA, 4, NA, NA, 6)
本文目的: 個人反思進入新領域後的所學所感, 分享MSBA 前三個月課程內容
背景: 大學文組(政治國關畢業輔經濟), 一年美國交換, 三年大陸工作經驗,主要在零售貿易業當國際業務/merchandiser, 因想增強自己的數據分析及hard skills, 決定進修 1 年MSBA 。
MSBA 學程: 一年學程分三個學期完成, 一個學期平均約12 周。 三學分的課程為期12周,1.5 學分的課程則是6周。 T1 課程為5門三學分的課。
General Level: 7/10 | Growth Level: 7/10
這堂課學習如何應用machine learning algorithms 處理數據, 主要使用工具為googl …
Learn:
datestr = '1956-01-31'
year, month, day = datestr.split('-')'/'.join([month, day, year])
4. dicts.items() method lets us iterate over the keys and values of a dictionary simultaneously.
for planet, initial in planet_to_initial.items():
print("{} begins with \"{}\"".format(planet.rjust(10), initial))5. enumerate
It allows us to loop over something and have an automatic counter.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
# Output:
# 1…