Only My Notes

Only My Notes

Follow

Follow

Tag

Python

#python

Read more stories on Hashnode

Articles with this tag

Intro to Spark SQL in Python

Nov 21, 20211 min read

Spark SQL is a component of Apache Spark that works with tabular data. # Load data from file df = spark.read.csv("trains.csv", header=True) # Create...

Intro to Spark SQL in Python

AWS in Python with Boto3

Oct 31, 20213 min read

To interact with AWS in Python, there is the Boto3 library. 1. AWS S3 S3 is the AWS Storage solution. import boto3 # Generate the boto3 client for...

AWS in Python with Boto3

PySpark Fundamentals

Oct 17, 20214 min read

Spark = tool for doing parallel computation with large datasets. Spark lets you spread data and computations over clusters with multiple nodes.pyspark...

PySpark Fundamentals

Basic Airflow in Python

Aug 5, 20216 min read

Airflow: platform to program workflows. DAG: workflow made up of tasks with dependencies. Define a DAG in Python: from airflow.models import DAG from...

Basic Airflow in Python

Data Processing in Shell

Jul 10, 20214 min read

Download Data using curl https://curl.haxx.se/download.html curl -O https://websitename.com/file001.txt #-O -> download file with it's name curl...

Data Processing in Shell

Data Ingestion with pandas

Apr 17, 20213 min read

1. Importing Data from Flat Files and Spreadsheets read_csv for all flat files import pandas as pd data = pd.read_csv('file.csv') data =...

Data Ingestion with pandas