CheatSheet For Data Analytics
🚀 Python Cheat Sheet
### Basic Syntax & Data Structures
Variables: `x = 5`, `name = "John"`
Data Types: Integer, Float, String, Boolean
Lists: `my_list = [1, 2, 3]`, `my_list.append(4)`, `my_list[0]`, `len(my_list)`
Dictionaries: `my_dict = {'key': 'value'}`, `my_dict['key']`, `my_dict.keys()`, `my_dict.values()`
Tuples: `my_tuple = (1, 2, 3)`, immutable
Sets: `my_set = {1, 2, 3}`, `my_set.add(4)`, unique elements
### Control Flow
If-Else: `if condition: ... elif condition: ... else: ...`
Loops: `for item in iterable: ...`, `while condition: ...`
### Functions & Lambda Expressions
Defining Functions: `def my_function(): ...`
Lambda Functions: `lambda arguments: expression`, e.g., `lambda x: x * 2`
### Pandas for Data Manipulation
Creating DataFrames: `import pandas as pd`, `df = pd.DataFrame(data)`
Reading Data: `pd.read_csv('file.csv')`, `pd.read_excel('file.xlsx')`
Data Selection: `df['column']`, `df[['col1', 'col2']]`, `df.iloc[rows, columns]`
Filtering: `df[df['column'] > value]`
Grouping: `df.groupby('column').agg({'column2': 'sum'})`
Pivot Tables: `df.pivot_table(index='column', values='column2', aggfunc='mean')`
Merging & Joining: `pd.merge(df1, df2, on='column', how='inner')`
### NumPy for Numerical Computing
Arrays: `import numpy as np`, `np.array([1, 2, 3])`
Array Operations: Element-wise operations, `+`, `-`, `*`, `/`
Statistics: `np.mean(array)`, `np.median(array)`, `np.std(array)`
### Matplotlib & Seaborn for Data Visualization
Matplotlib Basics: `import matplotlib.pyplot as plt`, `plt.plot(x, y)`, `plt.show()`
Seaborn: `import seaborn as sns`, `sns.lineplot(x='x_column', y='y_column', data=df)`
Plot Customization: Title, labels, legends
### Scikit-Learn for Machine Learning
Preparing Data: `from sklearn.model_selection import train_test_split`, `X_train, X_test, y_train, y_test = train_test_split(X, y)`
Model Training: `from sklearn.linear_model import LinearRegression`, `model = LinearRegression()`, `model.fit(X_train, y_train)`
Prediction: `predictions = model.predict(X_test)`
Evaluation: `from sklearn.metrics import mean_squared_error`, `mean_squared_error(y_test, predictions)`
### General Best Practices
Code Organization: Functions, classes, and modules for reusable code
Version Control: Use Git for version control and collaboration
Virtual Environments: Use `venv` or `conda` for managing project-specific dependencies
### Additional Libraries
Scrapy for web scraping, BeautifulSoup for HTML parsing
SQLAlchemy for database interaction
Plotly and Dash for interactive visualizations and web applications
------- end -------
### Basic SQL Syntax
SELECT: `SELECT column1, column2 FROM table_name;` - Retrieves specific columns from a table.
SELECT DISTINCT: `SELECT DISTINCT column1 FROM table_name;` - Returns unique values in the specified column.
WHERE: `SELECT * FROM table_name WHERE condition;` - Filters the result set to only include rows where the condition is TRUE.
ORDER BY: `SELECT * FROM table_name ORDER BY column1 ASC|DESC;` - Sorts the result set in ascending or descending order.
### Aggregation Functions
COUNT: `SELECT COUNT(column_name) FROM table_name WHERE condition;` - Returns the number of input rows that match a specific condition.
SUM: `SELECT SUM(column_name) FROM table_name WHERE condition;` - Returns the total sum of a numeric column.
AVG: `SELECT AVG(column_name) FROM table_name WHERE condition;` - Returns the average value of a numeric column.
MIN/MAX: `SELECT MIN(column_name) FROM table_name;` / `SELECT MAX(column_name) FROM table_name;` - Returns the smallest/largest value of the selected column.
### Joining Tables
INNER JOIN: `SELECT columns FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name;` - Selects records that have matching values in both tables.
LEFT JOIN (LEFT OUTER JOIN): `SELECT columns FROM table1 LEFT JOIN table2 ON table1.column_name = table2.column_name;` - Returns all records from the left table, and the matched records from the right table.
RIGHT JOIN (RIGHT OUTER JOIN): `SELECT columns FROM table1 RIGHT JOIN table2 ON table1.column_name = table2.column_name;` - Returns all records from the right table, and the matched records from the left table.
FULL JOIN (FULL OUTER JOIN): `SELECT columns FROM table1 FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;` - Returns all records when there is a match in either left or right table.
### Subqueries and CTEs
Subquery: `SELECT column_name FROM table_name WHERE column_name IN (SELECT column_name FROM table_name WHERE condition);` - A query nested inside another query.
Common Table Expressions (CTE): `WITH cte_name AS (SELECT column_name FROM table_name) SELECT * FROM cte_name;` Temporary result set that you can reference within another SELECT, INSERT, UPDATE, or DELETE statement.
### Window Functions
ROW_NUMBER: `SELECT ROW_NUMBER() OVER (ORDER BY column_name) FROM table_name;` - Assigns a unique sequential integer to rows within a result set.
RANK / DENSE_RANK: `SELECT RANK() OVER (ORDER BY column_name) FROM table_name;` - Assigns a rank to each row within a partition of a result set.
LEAD/LAG: `SELECT LEAD(column_name) OVER (ORDER BY column_name) FROM table_name;` - Accesses data from a subsequent/preceding row in the same result set.
### Data Manipulation
INSERT INTO: `INSERT INTO table_name (column1, column2) VALUES (value1, value2);` - Inserts new rows into a table.
UPDATE: `UPDATE table_name SET column1 = value1 WHERE condition;` - Updates existing records in a table.
DELETE: `DELETE FROM table_name WHERE condition;` - Deletes existing records in a table.
### Data Definition and Control
CREATE TABLE: `CREATE TABLE table_name (column1 datatype, column2 datatype);` - Creates a new table.
ALTER TABLE: `ALTER TABLE table_name ADD column_name datatype;` - Adds a new column
------- end -------
🚀 SQL Cheat Sheet
### Basic Formulas
SUM: `=SUM(A1:A10)` - Adds all numbers in a range of cells.
AVERAGE: `=AVERAGE(B1:B10)` - Calculates the average of a range of cells.
MIN/MAX: `=MIN(C1:C10)`, `=MAX(C1:C10)` - Finds the minimum/maximum value in a range.
### Lookup & Reference
VLOOKUP: `=VLOOKUP(value, table_range, col_index_num, [range_lookup])` - Searches for a value in the first column of a table and returns a value in the same row from a specified column.
INDEX: `=INDEX(array, row_num, [col_num])` - Returns the value of a cell in a specific row and column within a range.
MATCH: `=MATCH(lookup_value, lookup_array, [match_type])` - Searches for a specified item in a range of cells, and then returns the relative position of that item.
### Text Functions
CONCATENATE / CONCAT: `=CONCATENATE(A1, " ", B1)`, `=CONCAT(A1, " ", B1)` - Combines two or more strings into one string.
LEFT / RIGHT: `=LEFT(text, number_of_characters)`, `=RIGHT(text, number_of_characters)` - Extracts a specified number of characters from a text string starting from the left/right.
### Logical
IF: `=IF(logical_test, value_if_true, value_if_false)` - Checks whether a condition is met, and returns one value if TRUE, and another value if FALSE.
AND / OR: `=AND(logical1, [logical2], ...)`, `=OR(logical1, [logical2], ...)` - Returns TRUE if all arguments are TRUE / if any arguments are TRUE.
### Date and Time
TODAY: `=TODAY()` - Returns the current date.
NOW: `=NOW()` - Returns the current date and time.
YEAR / MONTH / DAY: `=YEAR(A1)`, `=MONTH(A1)`, `=DAY(A1)` - Extracts the year/month/day from a date.
### Financial
NPV: `=NPV(rate, value1, [value2], ...)` - Calculates the net present value of an investment based on a constant interest rate.
PMT: `=PMT(rate, nper, pv, [fv], [type])` - Calculates the payment for a loan based on constant payments and a constant interest rate.
### Statistical
COUNT: `=COUNT(A1:A10)` - Counts the number of cells that contain numbers.
COUNTIF / COUNTIFS: `=COUNTIF(range, criteria)`, `=COUNTIFS(range1, criteria1, [range2, criteria2],...)` - Counts the number of cells that meet a criterion; can use multiple criteria.
SUMIF / SUMIFS: `=SUMIF(range, criteria, [sum_range])`, `=SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2],...)` - Adds the cells specified by a given condition(s).
### Data Analysis
PivotTables: Essential for summarizing, analyzing, exploring, and presenting summary data. PivotTables can dynamically sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data.
Data Validation: Use Data Validation (`Data` > `Data Validation`) to create dropdown menus, restrict cell input values to certain types, or create rules that dictate what can be entered.
### Array Formulas
Array Formula: Entered by pressing `Ctrl` + `Shift` + `Enter`. Useful for performing multiple calculations on one or more items in an array. For example, `{=SUM(IF(A1:A10>5, A1:A10))}` sums only the numbers greater than 5 in the range.
------- end -------
🚀 Excel Cheat Sheet
### 1. Mean (Arithmetic Average)
Definition: The sum of all values divided by the number of values.
Example: For sales values $30, $50, $10, $170, $20, $80, the mean is ($30 + $50 + $10 + $170 + $20 + $80) / 6 = $60.
### 2. Median
Definition: The middle value of an ordered dataset.
Example: For the dataset $10, $20, $30, $50, $80, $170, the median is $40 (average of $30 and $50).
### 3. Mode
Definition: The most frequently occurring value in a dataset.
Example: For the datasets 3, 5, 1, 5, 2, 5, 7, the mode is 5.
### 4. Variance
Definition: A measure of the spread of numbers in a dataset; the average of the squared differences from the mean.
Example: For sales values $30, $50, $10, $170, $20, $80 with a mean of $60, the variance is [(30-60) ² + (50-60)² + (10-60)² + (170-60)² + (20-60)² + (80-60)²] / 6 = 2833.33.
### 5. Standard Deviation**
Definition: The square root of the variance; measures the average distance from the mean.
Example: For the same sales values, the standard deviation is √2833.33 = 53.22.
### 6. Descriptive Statistics
Definition: Statistical methods that describe and summarize data.
Example: Calculating the mean, median, mode, variance, and standard deviation for a dataset.
### 7. Exploratory Data Analysis (EDA)
Definition: Visualizing and exploring data to find patterns, trends, and relationships.
Example: Calculating the mean, median, mode, variance, and standard deviation for a dataset.
### 8. Predictive Statistics (Inferential Statistics)
Definition: Using data from a sample to make inferences about a population.
Example: Creating histograms, scatter plots, and box plots to visualize sales data.
### 9. Data Collection
Definition: Gathering relevant data from various sources.
Example: Collecting sales data through surveys, databases, and records.
### 10. Data Cleaning and Preprocessing
Definition: Ensuring data quality and reliability by removing outliers, correcting missing values, and formatting data.
Example: Removing a sales outlier value of $1000 from a dataset of typical sales values around $50.
### 11. Data Structure and Organization
Definition: Understanding the format and organization of data.
Example: Identifying data as numeric or categorical and arranging it in rows and columns.
------- end -------