Sign in

新加坡MSBA 20/21就讀, 希望能在6月找到正職,7月順利入職. Target 職缺: BA/ DA /DS 主要還是看各公司的job scope 內容, 希望多一點business 含量

只前面試DS intern的時候,有被要求當場create a RMSE function, SQL query, 以及詢問ML 相關問題

  1. SQL 面試練習 網站

2. python 練習:

pandas:

3. ML 相關問題:

a. what is the difference between Kmeans and KNN?

b. does random forest need cross-validation ?

c. how does the decision tree split?

d. what is neural network? how does it work? how to choose the weight?

e. what is the difference between random forest and gradient boosting?


  1. df.column.plot(kind=” ”)

2. df.price.hist(bins=10,by=df.room_type)

劃出不同room_type 的價格分布圖

3. df[[“column1”,”column2"]].dropna().plot()

可以將兩個columns 畫一起

seaborn as sns

  1. sns.barplot(data=df,x=”price”,y=”room_type”)

根據room type 展現 average price

2. sns.barplot(data=df.loc[df.minimum_nights<7],y=”price”,hue=”room_type”,x=”minimum_nights”)

根據minimum_nights 及不同的 room_type 展現avg price

  1. seaborn.distplot — similar to histogram

Gaussian Kernel Density Estimate

sns.distplot(data,hist=False,rug=True,kde_kws={“shade”:True}

2. sns.kdeplot(data, bw=10,shade=Ture)

bw=binwidth

3. sns.regplot(X=”column1",Y=”column2",hue=”column3",data=df)

regplot function generate a scatterplot with a regression line


Pandas data cleaning

  1. fill in the missing value:

a. df[‘Test Score’] = df[‘Test Score’].fillna(df[‘Test Score’].interpolate())

interpolate 以遺失值的前後兩者平均數填補遺失值

b. df[“score”]=df[“score”].fillna(df[“score”].mean()) 以平均值填補

c. df=df.fillna(method=”pad”) 以前值填補遺失值

2. non-standard missing value

isnull() function only picks up ‘Nan’ , will not other pick other types of missing values such as a dash(‘-‘) ,blank or even ‘na’

df=df.replace([“-”,” “,”na”],np.nan)

3. creating dataframe from list or scratch

a. df=pd.Dataframe

Months=[“jan”,”feb”,”mar”,”may”]

Days=[1,2,3,5]

df[“month”]=Months

df[“Days”]=Days

4. creating new colnames for dataframe

df.rename(columns={‘id_x’:’purchase_id’, ‘id_y’:’customer_id’,’id’:’product_id’})

5. count 資料轉換是否有錯誤

print(pd.to_datetime(combinedData[‘purch_date’], errors=’coerce’).isnull().value_counts())

print(pd.to_datetime(combinedData['purch_date'], errors='coerce'))

errors=”coerce” …replace error row with np.NAN, if we want to drop the error, we can use dropna afterwards.

errors=”raise” … show the value that error occurs


R coding—

  1. numeric(length) → 建立長度為 length 的實數向量

2. names(x) — extract the name of object

3. cumsum function — Returns a vector whose elements are the cumulative sums, products, minima or maxima of the elements of the argument.

cumsum(x) / cumprod(x) / cummax(x) / cummin(x)

cumsum(!is.na(x)) — return the add on T or F of x

x<-c(1, 2, 0, NA, 4, NA, NA, 6)


本文目的: 個人反思進入新領域後的所學所感, 分享MSBA 前三個月課程內容

背景: 大學文組(政治國關畢業輔經濟), 一年美國交換, 三年大陸工作經驗,主要在零售貿易業當國際業務/merchandiser, 因想增強自己的數據分析及hard skills, 決定進修 1 年MSBA 。

MSBA 學程: 一年學程分三個學期完成, 一個學期平均約12 周。 三學分的課程為期12周,1.5 學分的課程則是6周。 T1 課程為5門三學分的課。

  1. AI and Big data in bz (三學分|每周3hr課程)

General Level: 7/10 | Growth Level: 7/10

這堂課學習如何應用machine learning algorithms 處理數據, 主要使用工具為googl …


  1. complete the kaggle python course

Learn:

  1. print(“happy”,end=” ”) the default is \n
  2. define a function with docstring to explain
  3. turn a string to list with split ; combine words together with join
datestr = '1956-01-31'
year, month, day = datestr.split('-')
'/'.join([month, day, year])

4. dicts.items() method lets us iterate over the keys and values of a dictionary simultaneously.

for planet, initial in planet_to_initial.items():
print("{} begins with \"{}\"".format(planet.rjust(10), initial))

5. enumerate

It allows us to loop over something and have an automatic counter.

my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)

# Output:
# 1…

Ally

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store