Posts

Wordcount project in Spark with JAVA using Maven and Gradle

Follow the steps for Wordcount project in Spark with JAVA using Maven and Gradle Download winutils.exe from https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin Move it in F:\BigData\hadoop\bin Download pre-built Spark from http://spark.apache.org/downloads.html Extract and move to F:\BigData Set Environment Variables: 1. SPARK_HOME - F:\BigData\spark-3.0.1-bin-hadoop2.7 2. HADOOP_HOME - F:\BigData\hadoop Path: 1. %SPARK_HOME%\bin 2. %HADOOP_HOME%\bin Confirm installation via CMD Enter: spark-shell It should show the version of spark as well as shell should start Set up the Hadoop Scratch directory Create the following folder: C:\tmp\hive Navigate to F:\BigData\hadoop\bin Set permissions by typing  winutils.exe chmod -R 777 C:\tmp\hive For Maven project- Create new project: D:\Codes\Spark\First project Create class SimpleApp.java in D:\Codes\Spark\First project\src\main\java /* SimpleApp.java */ import org.apache.spark.api.java.*;...

Video crop/cut using python

To cut, merge or split videos you can use Bandicut software application available for Windows for free. But they put their ad at end of video. You can use the following Python code to crop the advertisement part. Time is shown in seconds. Link for Bandicut software  https://www.bandicam.com/bandicut-video-cutter/ To install moviepy !pip3 install --trusted-host pypi.python.org moviepy !pip3 install imageio-ffmpeg python3 import imageio !pip3 install imageio-ffmpeg from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip start_time=0  #0 seconds start time end_time=462 # end time for the video cropping the advertisement part ffmpeg_extract_subclip("intial_video.mkv", start_time, end_time, targetname="video_edited.mkv") Thanks

Numpy and Pandas initialization and modification

Following are the methods with which you can initialise an array 1. Initialise array from numpy import ndarray a = ndarray((5,),int) 2. Initialise char numpy array import numpy as np country = np.array(['USA', 'Japan', 'UK', '', 'India', 'China']) 3. Using numpy empty chars = np.empty((3),dtype=str) 4. Make a list and convert to numpy array country=['USA', 'Japan', 'UK', '', 'India', 'China'] ll=[] for i in country:   ll.append(i) country=np.asarray(ll,dtype=str) Methods to modify numpy array 1. Expand dimension X_valcnn= np.expand_dims(X_val, axis=2) 2. Reshape X_traincnn = np.reshape(X_traincnn, (X_traincnn.shape,48,48,1)) 3. Make ndarray to 1D array y_train=y_train.ravel() 4. Convert Dataframe/series to numpy X_traincnn.to_numpy()

MS Word Page numbering for Reports

Image
We all have faced problems while numbering the pages for reports where page numbers start from page 2 which is numbered as (i). Then we have pages numbered from main content with numbering as page (1) which is very difficult sometimes. Images are related to Microsoft Word 2007. Steps to follow: 1. Click on the  Paragraph symbol. It can be found in  the Home Tab. You will find section breaks that are already applied. Remove the section breaks by the simple cut(Backspace). 2. Add section breaks to one page before the Index and Introduction page. It can be found Page Layout Tab.    3. Add page number after clicking on a page required section(in this case List of abbreviations page). a) Click at image marked point(1) b) Select the page number style from image marked point(2). It will get applied. c) Select the page number from the page and format according to the need from image mark...