I record web scraping my learning in this story and want to apply it on scraping some Hong Kong news website because I worked in advertising job before and this is familiar with me. This tutorial scraps timejobs.com.

Here are some points I think it is important

first thing to…

json.loads() takes in a string and returns a json object.

json.dumps() takes in a json object and returns a string.

json.load it to encode it into json while json.dumps decode it into string.

https://www.educative.io/edpresso/what-is-the-difference-between-jsonloads-and-jsondumps

An HTTP POST request containing a parameter to trigger a cloud function

Use a url to trigger the cf , parameter put on the link after ?

Example: http://asia-esat2-project-id.cloudfunction.net/cfname?date=2020–01–01

def function(request):
args = request.args.to_dict() # request parameter to dict
date = args.date # parameter date
gcs = storage.Client() # get Google Clientbucket = gcs.get_bucket('abc')  # get bucket blobs = bucket.list_blobs() # list all blobsfor blob in blobs:  # each blob   if re.match(r'[], blob.name) # condition search blob.namedest_bucket = gcs.get_bucket('xxx') # get a destination bucketdest_name = "{}/{}/{}" % (log_date, log_type, filename) new_blob = bucket.copy_blob(file, dest_bucket, dest_name) 
# copy blob to the dest bucket
  1. “”.join()
  2. regular expressions (re)
  3. map(a, b)

I have solved my first Leetcode challenge in Python. Coagulation!

I record my thinking process in this ‘Two Sum’ exercise.

First, the three example output are consecutive indices i.e. nums[1] +nums[2]. One for loop is enough for this question.

I got a correct answer after clicking the ‘Run Code’. Great!

However, I neglect one situation that is the slices are not consecutive.

1.Create a new project in gitlab

2.Create new branchs

3.Push to Git

4.Change to “Google-source”

5.Add ‘File’ and files copy to git file

6.commit those ‘File’

7.Push to Git

Git Command

how to delete file

rm.git

to delete .git

rm-rm.git

to delete file / .git

create new branchs

git.checkout -b develop

create a new branch ‘develop’

check path

pwd

Leave

cd../

go to 上一層

cd

入去

Show Files

ls -al

list all files

Check Status

git status

String: '48.49.50.51'

Use

NET.SAFE_IP_FROM_STRING(addr_str)

the string need to change to IP

Return data type: BYTES

Use

NET.IPV4_TO_INT64(addr_bin)

Change to int64 and

between a start_ip range and end_ip range

Data

continuous data vs discrete data

If all data in the dataset are discrete data, some algorithms are not suitable. KNN, K-means …with distance

Discrete data: 1,2,3

how to calculate their distance??

Association Rules in this dataset could be applied.

pd.read_csv('data.csv')

Kenneth Law

1 Year Self-study in Data Science | want to become a data engineer | write posts to record my improvement

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store